[PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
@ 2025-05-13 16:34 Fuad Tabba
  2025-05-13 16:34 ` [PATCH v9 01/17] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
                   ` (16 more replies)
  0 siblings, 17 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Main changes since v8 [1]:
- Added guest_memfd flag that toggles support for in-place shared memory
- Added best-effort validation that the userspace memory address range
  matches the shared memory backed by guest_memfd
- Rework handling faults for shared guest_memfd memory in x86
- Fixes based on feedback from the previous series
- Rebase on Linux 6.15-rc6

The purpose of this series is to allow mapping guest_memfd backed memory
at the host. This support enables VMMs like Firecracker to run guests
backed completely by guest_memfd [2]. Combined with Patrick's series for
direct map removal in guest_memfd [3], this would allow running VMs that
offer additional hardening against Spectre-like transient execution
attacks.

This series will also serve as a base for _restricted_ mmap() support
for guest_memfd backed memory at the host for CoCos that allow sharing
guest memory in-place with the host [4].

Patches 1 to 6 are mainly about decoupling the concept of guest memory
being private vs guest memory being backed by guest_memfd. They are
mostly refactoring and renaming.

Patches 7 and 8 add support for in-place shared memory, as well as the
ability to map it by the host as long as it is shared, gated by a new
configuration option, toggled by a new flag, and advertised to userspace
by a new capability (introduced in patch 15).

Patches 9 to 14 add x86 and arm64 support for in-place shared memory.

Patch 15 introduces the capability that advertises support for in-place
shared memory, and updates the documentation.

Patches 16 and 17 add new selftests for the added features.

For details on how to test this patch series, and on how to boot a guest
has uses the new features, please refer to v8 [1].

Cheers,
/fuad

[1] https://lore.kernel.org/all/20250430165655.605595-1-tabba@google.com/
[2] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[3] https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
[4] https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/

Ackerley Tng (4):
  KVM: guest_memfd: Check that userspace_addr and fd+offset refer to
    same range
  KVM: x86/mmu: Handle guest page faults for guest_memfd with shared
    memory
  KVM: x86: Compute max_mapping_level with input from guest_memfd
  KVM: selftests: Test guest_memfd same-range validation

Fuad Tabba (13):
  KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
    CONFIG_KVM_GENERIC_GMEM_POPULATE
  KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
  KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
  KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
  KVM: Fix comments that refer to slots_lock
  KVM: guest_memfd: Allow host to map guest_memfd() pages
  KVM: arm64: Refactor user_mem_abort() calculation of force_pte
  KVM: arm64: Rename variables in user_mem_abort()
  KVM: arm64: Handle guest_memfd()-backed guest page faults
  KVM: arm64: Enable mapping guest_memfd in arm64
  KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM
  KVM: selftests: guest_memfd mmap() test when mapping is allowed

 Documentation/virt/kvm/api.rst                |  18 +
 arch/arm64/include/asm/kvm_host.h             |  10 +
 arch/arm64/kvm/Kconfig                        |   1 +
 arch/arm64/kvm/mmu.c                          | 149 +++++----
 arch/x86/include/asm/kvm_host.h               |  22 +-
 arch/x86/kvm/Kconfig                          |   4 +-
 arch/x86/kvm/mmu/mmu.c                        | 135 +++++---
 arch/x86/kvm/svm/sev.c                        |   4 +-
 arch/x86/kvm/svm/svm.c                        |   4 +-
 arch/x86/kvm/x86.c                            |   3 +-
 include/linux/kvm_host.h                      |  76 ++++-
 include/uapi/linux/kvm.h                      |   2 +
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  | 313 ++++++++++++++++--
 virt/kvm/Kconfig                              |  15 +-
 virt/kvm/Makefile.kvm                         |   2 +-
 virt/kvm/guest_memfd.c                        | 152 ++++++++-
 virt/kvm/kvm_main.c                           |  21 +-
 virt/kvm/kvm_mm.h                             |   4 +-
 19 files changed, 753 insertions(+), 183 deletions(-)


base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v9 01/17] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-21  7:14   ` Gavin Shan
  2025-05-13 16:34 ` [PATCH v9 02/17] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The option KVM_PRIVATE_MEM enables guest_memfd in general. Subsequent
patches add shared memory support to guest_memfd. Therefore, rename it
to KVM_GMEM to make its purpose clearer.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 include/linux/kvm_host.h        | 10 +++++-----
 virt/kvm/Kconfig                |  8 ++++----
 virt/kvm/Makefile.kvm           |  2 +-
 virt/kvm/kvm_main.c             |  4 ++--
 virt/kvm/kvm_mm.h               |  4 ++--
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7bc174a1f1cb..52f6f6d08558 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2253,7 +2253,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 		       int tdp_max_root_level, int tdp_huge_page_level);
 
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
 #else
 #define kvm_arch_has_private_mem(kvm) false
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 291d49b9bf05..d6900995725d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -601,7 +601,7 @@ struct kvm_memory_slot {
 	short id;
 	u16 as_id;
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	struct {
 		/*
 		 * Writes protected by kvm->slots_lock.  Acquiring a
@@ -722,7 +722,7 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
  * Arch code must define kvm_arch_has_private_mem if support for private memory
  * is enabled.
  */
-#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
+#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
 static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
 {
 	return false;
@@ -2504,7 +2504,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
-	return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
+	return IS_ENABLED(CONFIG_KVM_GMEM) &&
 	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
 }
 #else
@@ -2514,7 +2514,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 }
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
 		     int *max_order);
@@ -2527,7 +2527,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
-#endif /* CONFIG_KVM_PRIVATE_MEM */
+#endif /* CONFIG_KVM_GMEM */
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 727b542074e7..49df4e32bff7 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -112,19 +112,19 @@ config KVM_GENERIC_MEMORY_ATTRIBUTES
        depends on KVM_GENERIC_MMU_NOTIFIER
        bool
 
-config KVM_PRIVATE_MEM
+config KVM_GMEM
        select XARRAY_MULTI
        bool
 
 config KVM_GENERIC_PRIVATE_MEM
        select KVM_GENERIC_MEMORY_ATTRIBUTES
-       select KVM_PRIVATE_MEM
+       select KVM_GMEM
        bool
 
 config HAVE_KVM_ARCH_GMEM_PREPARE
        bool
-       depends on KVM_PRIVATE_MEM
+       depends on KVM_GMEM
 
 config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
-       depends on KVM_PRIVATE_MEM
+       depends on KVM_GMEM
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 724c89af78af..8d00918d4c8b 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -12,4 +12,4 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
-kvm-$(CONFIG_KVM_PRIVATE_MEM) += $(KVM)/guest_memfd.o
+kvm-$(CONFIG_KVM_GMEM) += $(KVM)/guest_memfd.o
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e85b33a92624..4996cac41a8f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4842,7 +4842,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_MEMORY_ATTRIBUTES:
 		return kvm_supported_mem_attributes(kvm);
 #endif
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_has_private_mem(kvm);
 #endif
@@ -5276,7 +5276,7 @@ static long kvm_vm_ioctl(struct file *filp,
 	case KVM_GET_STATS_FD:
 		r = kvm_vm_ioctl_get_stats_fd(kvm);
 		break;
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	case KVM_CREATE_GUEST_MEMFD: {
 		struct kvm_create_guest_memfd guest_memfd;
 
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index acef3f5c582a..ec311c0d6718 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -67,7 +67,7 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
 }
 #endif /* HAVE_KVM_PFNCACHE */
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 void kvm_gmem_init(struct module *module);
 int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args);
 int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -91,6 +91,6 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 {
 	WARN_ON_ONCE(1);
 }
-#endif /* CONFIG_KVM_PRIVATE_MEM */
+#endif /* CONFIG_KVM_GMEM */
 
 #endif /* __KVM_MM_H__ */
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 02/17] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
  2025-05-13 16:34 ` [PATCH v9 01/17] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-13 21:56   ` Ira Weiny
  2025-05-21  7:14   ` Gavin Shan
  2025-05-13 16:34 ` [PATCH v9 03/17] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The option KVM_GENERIC_PRIVATE_MEM enables populating a GPA range with
guest data. Rename it to KVM_GENERIC_GMEM_POPULATE to make its purpose
clearer.

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/Kconfig     | 4 ++--
 include/linux/kvm_host.h | 2 +-
 virt/kvm/Kconfig         | 2 +-
 virt/kvm/guest_memfd.c   | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fe8ea8c097de..b37258253543 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -46,7 +46,7 @@ config KVM_X86
 	select HAVE_KVM_PM_NOTIFIER if PM
 	select KVM_GENERIC_HARDWARE_ENABLING
 	select KVM_GENERIC_PRE_FAULT_MEMORY
-	select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
+	select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
 	select KVM_WERROR if WERROR
 
 config KVM
@@ -145,7 +145,7 @@ config KVM_AMD_SEV
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select ARCH_HAS_CC_PLATFORM
-	select KVM_GENERIC_PRIVATE_MEM
+	select KVM_GENERIC_GMEM_POPULATE
 	select HAVE_KVM_ARCH_GMEM_PREPARE
 	select HAVE_KVM_ARCH_GMEM_INVALIDATE
 	help
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d6900995725d..7ca23837fa52 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2533,7 +2533,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
 #endif
 
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 /**
  * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
  *
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 49df4e32bff7..559c93ad90be 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -116,7 +116,7 @@ config KVM_GMEM
        select XARRAY_MULTI
        bool
 
-config KVM_GENERIC_PRIVATE_MEM
+config KVM_GENERIC_GMEM_POPULATE
        select KVM_GENERIC_MEMORY_ATTRIBUTES
        select KVM_GMEM
        bool
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b2aa6bf24d3a..befea51bbc75 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -638,7 +638,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque)
 {
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 03/17] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
  2025-05-13 16:34 ` [PATCH v9 01/17] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
  2025-05-13 16:34 ` [PATCH v9 02/17] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-21  7:15   ` Gavin Shan
  2025-05-13 16:34 ` [PATCH v9 04/17] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The function kvm_arch_has_private_mem() is used to indicate whether
guest_memfd is supported by the architecture, which until now implies
that its private. To decouple guest_memfd support from whether the
memory is private, rename this function to kvm_arch_supports_gmem().

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 8 ++++----
 arch/x86/kvm/mmu/mmu.c          | 8 ++++----
 include/linux/kvm_host.h        | 6 +++---
 virt/kvm/kvm_main.c             | 6 +++---
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 52f6f6d08558..4a83fbae7056 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2254,9 +2254,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 
 #ifdef CONFIG_KVM_GMEM
-#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
+#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
 #else
-#define kvm_arch_has_private_mem(kvm) false
+#define kvm_arch_supports_gmem(kvm) false
 #endif
 
 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
@@ -2309,8 +2309,8 @@ enum {
 #define HF_SMM_INSIDE_NMI_MASK	(1 << 2)
 
 # define KVM_MAX_NR_ADDRESS_SPACES	2
-/* SMM is currently unsupported for guests with private memory. */
-# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 2)
+/* SMM is currently unsupported for guests with guest_memfd (esp private) memory. */
+# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_supports_gmem(kvm) ? 1 : 2)
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
 #else
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d1b632e33d2..b66f1bf24e06 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4917,7 +4917,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
 	if (r)
 		return r;
 
-	if (kvm_arch_has_private_mem(vcpu->kvm) &&
+	if (kvm_arch_supports_gmem(vcpu->kvm) &&
 	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(range->gpa)))
 		error_code |= PFERR_PRIVATE_ACCESS;
 
@@ -7705,7 +7705,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
 	 * Zapping SPTEs in this case ensures KVM will reassess whether or not
 	 * a hugepage can be used for affected ranges.
 	 */
-	if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+	if (WARN_ON_ONCE(!kvm_arch_supports_gmem(kvm)))
 		return false;
 
 	if (WARN_ON_ONCE(range->end <= range->start))
@@ -7784,7 +7784,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 	 * a range that has PRIVATE GFNs, and conversely converting a range to
 	 * SHARED may now allow hugepages.
 	 */
-	if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+	if (WARN_ON_ONCE(!kvm_arch_supports_gmem(kvm)))
 		return false;
 
 	/*
@@ -7840,7 +7840,7 @@ void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
 {
 	int level;
 
-	if (!kvm_arch_has_private_mem(kvm))
+	if (!kvm_arch_supports_gmem(kvm))
 		return;
 
 	for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7ca23837fa52..6ca7279520cf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -719,11 +719,11 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
 #endif
 
 /*
- * Arch code must define kvm_arch_has_private_mem if support for private memory
+ * Arch code must define kvm_arch_supports_gmem if support for guest_memfd
  * is enabled.
  */
-#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
-static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
+#if !defined(kvm_arch_supports_gmem) && !IS_ENABLED(CONFIG_KVM_GMEM)
+static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
 {
 	return false;
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4996cac41a8f..2468d50a9ed4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1531,7 +1531,7 @@ static int check_memory_region_flags(struct kvm *kvm,
 {
 	u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
 
-	if (kvm_arch_has_private_mem(kvm))
+	if (kvm_arch_supports_gmem(kvm))
 		valid_flags |= KVM_MEM_GUEST_MEMFD;
 
 	/* Dirty logging private memory is not currently supported. */
@@ -2362,7 +2362,7 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 static u64 kvm_supported_mem_attributes(struct kvm *kvm)
 {
-	if (!kvm || kvm_arch_has_private_mem(kvm))
+	if (!kvm || kvm_arch_supports_gmem(kvm))
 		return KVM_MEMORY_ATTRIBUTE_PRIVATE;
 
 	return 0;
@@ -4844,7 +4844,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #endif
 #ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
-		return !kvm || kvm_arch_has_private_mem(kvm);
+		return !kvm || kvm_arch_supports_gmem(kvm);
 #endif
 	default:
 		break;
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 04/17] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (2 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 03/17] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-21  7:15   ` Gavin Shan
  2025-05-13 16:34 ` [PATCH v9 05/17] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The bool has_private_mem is used to indicate whether guest_memfd is
supported. Rename it to supports_gmem to make its meaning clearer and to
decouple memory being private from guest_memfd.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 4 ++--
 arch/x86/kvm/mmu/mmu.c          | 2 +-
 arch/x86/kvm/svm/svm.c          | 4 ++--
 arch/x86/kvm/x86.c              | 3 +--
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a83fbae7056..709cc2a7ba66 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1331,7 +1331,7 @@ struct kvm_arch {
 	unsigned int indirect_shadow_pages;
 	u8 mmu_valid_gen;
 	u8 vm_type;
-	bool has_private_mem;
+	bool supports_gmem;
 	bool has_protected_state;
 	bool pre_fault_allowed;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
@@ -2254,7 +2254,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 
 #ifdef CONFIG_KVM_GMEM
-#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
+#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
 #else
 #define kvm_arch_supports_gmem(kvm) false
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b66f1bf24e06..69bf2ef22ed0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3486,7 +3486,7 @@ static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault *fault
 	 * on RET_PF_SPURIOUS until the update completes, or an actual spurious
 	 * case might go down the slow path. Either case will resolve itself.
 	 */
-	if (kvm->arch.has_private_mem &&
+	if (kvm->arch.supports_gmem &&
 	    fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
 		return false;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a89c271a1951..a05b7dc7b717 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5110,8 +5110,8 @@ static int svm_vm_init(struct kvm *kvm)
 			(type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
 		to_kvm_sev_info(kvm)->need_init = true;
 
-		kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
-		kvm->arch.pre_fault_allowed = !kvm->arch.has_private_mem;
+		kvm->arch.supports_gmem = (type == KVM_X86_SNP_VM);
+		kvm->arch.pre_fault_allowed = !kvm->arch.supports_gmem;
 	}
 
 	if (!pause_filter_count || !pause_filter_thresh)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9896fd574bfc..12433b1e755b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12716,8 +12716,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		return -EINVAL;
 
 	kvm->arch.vm_type = type;
-	kvm->arch.has_private_mem =
-		(type == KVM_X86_SW_PROTECTED_VM);
+	kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
 	/* Decided by the vendor code for other VM types.  */
 	kvm->arch.pre_fault_allowed =
 		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 05/17] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (3 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 04/17] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-21  7:16   ` Gavin Shan
  2025-05-13 16:34 ` [PATCH v9 06/17] KVM: Fix comments that refer to slots_lock Fuad Tabba
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The function kvm_slot_can_be_private() is used to check whether a memory
slot is backed by guest_memfd. Rename it to kvm_slot_has_gmem() to make
that clearer and to decouple memory being private from guest_memfd.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 4 ++--
 arch/x86/kvm/svm/sev.c   | 4 ++--
 include/linux/kvm_host.h | 2 +-
 virt/kvm/guest_memfd.c   | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 69bf2ef22ed0..2b6376986f96 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3283,7 +3283,7 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	bool is_private = kvm_slot_can_be_private(slot) &&
+	bool is_private = kvm_slot_has_gmem(slot) &&
 			  kvm_mem_is_private(kvm, gfn);
 
 	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
@@ -4496,7 +4496,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 {
 	int max_order, r;
 
-	if (!kvm_slot_can_be_private(fault->slot)) {
+	if (!kvm_slot_has_gmem(fault->slot)) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 		return -EFAULT;
 	}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a7a7dc507336..27759ca6d2f2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2378,7 +2378,7 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	mutex_lock(&kvm->slots_lock);
 
 	memslot = gfn_to_memslot(kvm, params.gfn_start);
-	if (!kvm_slot_can_be_private(memslot)) {
+	if (!kvm_slot_has_gmem(memslot)) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -4688,7 +4688,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 	}
 
 	slot = gfn_to_memslot(kvm, gfn);
-	if (!kvm_slot_can_be_private(slot)) {
+	if (!kvm_slot_has_gmem(slot)) {
 		pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
 				    gpa);
 		return;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6ca7279520cf..d9616ee6acc7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -614,7 +614,7 @@ struct kvm_memory_slot {
 #endif
 };
 
-static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot)
+static inline bool kvm_slot_has_gmem(const struct kvm_memory_slot *slot)
 {
 	return slot && (slot->flags & KVM_MEM_GUEST_MEMFD);
 }
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index befea51bbc75..6db515833f61 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -654,7 +654,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 		return -EINVAL;
 
 	slot = gfn_to_memslot(kvm, start_gfn);
-	if (!kvm_slot_can_be_private(slot))
+	if (!kvm_slot_has_gmem(slot))
 		return -EINVAL;
 
 	file = kvm_gmem_get_file(slot);
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 06/17] KVM: Fix comments that refer to slots_lock
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (4 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 05/17] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-21  7:16   ` Gavin Shan
  2025-05-13 16:34 ` [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Fix comments so that they refer to slots_lock instead of slots_locks
(remove trailing s).

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 2 +-
 virt/kvm/kvm_main.c      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d9616ee6acc7..ae70e4e19700 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -859,7 +859,7 @@ struct kvm {
 	struct notifier_block pm_notifier;
 #endif
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-	/* Protected by slots_locks (for writes) and RCU (for reads) */
+	/* Protected by slots_lock (for writes) and RCU (for reads) */
 	struct xarray mem_attr_array;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2468d50a9ed4..6289ea1685dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -333,7 +333,7 @@ void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
 	 * All current use cases for flushing the TLBs for a specific memslot
 	 * are related to dirty logging, and many do the TLB flush out of
 	 * mmu_lock. The interaction between the various operations on memslot
-	 * must be serialized by slots_locks to ensure the TLB flush from one
+	 * must be serialized by slots_lock to ensure the TLB flush from one
 	 * operation is observed by any other operation on the same memslot.
 	 */
 	lockdep_assert_held(&kvm->slots_lock);
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (5 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 06/17] KVM: Fix comments that refer to slots_lock Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-13 18:37   ` Ackerley Tng
                     ` (6 more replies)
  2025-05-13 16:34 ` [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range Fuad Tabba
                   ` (9 subsequent siblings)
  16 siblings, 7 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

This patch enables support for shared memory in guest_memfd, including
mapping that memory at the host userspace. This support is gated by the
configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
guest_memfd instance.

Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 10 ++++
 include/linux/kvm_host.h        | 13 +++++
 include/uapi/linux/kvm.h        |  1 +
 virt/kvm/Kconfig                |  5 ++
 virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
 5 files changed, 117 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 709cc2a7ba66..f72722949cae 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 #ifdef CONFIG_KVM_GMEM
 #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
+
+/*
+ * CoCo VMs with hardware support that use guest_memfd only for backing private
+ * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
+ */
+#define kvm_arch_vm_supports_gmem_shared_mem(kvm)			\
+	(IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&			\
+	 ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||		\
+	  (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
 #else
 #define kvm_arch_supports_gmem(kvm) false
+#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false
 #endif
 
 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae70e4e19700..2ec89c214978 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
 }
 #endif
 
+/*
+ * Returns true if this VM supports shared mem in guest_memfd.
+ *
+ * Arch code must define kvm_arch_vm_supports_gmem_shared_mem if support for
+ * guest_memfd is enabled.
+ */
+#if !defined(kvm_arch_vm_supports_gmem_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
+static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
+{
+	return false;
+}
+#endif
+
 #ifndef kvm_arch_has_readonly_mem
 static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6ae8ad8934b..9857022a0f0c 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
 #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
 
 #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
+#define GUEST_MEMFD_FLAG_SUPPORT_SHARED	(1UL << 0)
 
 struct kvm_create_guest_memfd {
 	__u64 size;
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 559c93ad90be..f4e469a62a60 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
 config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
        depends on KVM_GMEM
+
+config KVM_GMEM_SHARED_MEM
+       select KVM_GMEM
+       bool
+       prompt "Enables in-place shared memory for guest_memfd"
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 6db515833f61..8e6d1866b55e 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
 	return gfn - slot->base_gfn + slot->gmem.pgoff;
 }
 
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+
+static bool kvm_gmem_supports_shared(struct inode *inode)
+{
+	uint64_t flags = (uint64_t)inode->i_private;
+
+	return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
+}
+
+static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
+{
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct folio *folio;
+	vm_fault_t ret = VM_FAULT_LOCKED;
+
+	filemap_invalidate_lock_shared(inode->i_mapping);
+
+	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+	if (IS_ERR(folio)) {
+		int err = PTR_ERR(folio);
+
+		if (err == -EAGAIN)
+			ret = VM_FAULT_RETRY;
+		else
+			ret = vmf_error(err);
+
+		goto out_filemap;
+	}
+
+	if (folio_test_hwpoison(folio)) {
+		ret = VM_FAULT_HWPOISON;
+		goto out_folio;
+	}
+
+	if (WARN_ON_ONCE(folio_test_large(folio))) {
+		ret = VM_FAULT_SIGBUS;
+		goto out_folio;
+	}
+
+	if (!folio_test_uptodate(folio)) {
+		clear_highpage(folio_page(folio, 0));
+		kvm_gmem_mark_prepared(folio);
+	}
+
+	vmf->page = folio_file_page(folio, vmf->pgoff);
+
+out_folio:
+	if (ret != VM_FAULT_LOCKED) {
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+
+out_filemap:
+	filemap_invalidate_unlock_shared(inode->i_mapping);
+
+	return ret;
+}
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+	.fault = kvm_gmem_fault_shared,
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	if (!kvm_gmem_supports_shared(file_inode(file)))
+		return -ENODEV;
+
+	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
+	    (VM_SHARED | VM_MAYSHARE)) {
+		return -EINVAL;
+	}
+
+	vma->vm_ops = &kvm_gmem_vm_ops;
+
+	return 0;
+}
+#else
+#define kvm_gmem_mmap NULL
+#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
+
 static struct file_operations kvm_gmem_fops = {
+	.mmap		= kvm_gmem_mmap,
 	.open		= generic_file_open,
 	.release	= kvm_gmem_release,
 	.fallocate	= kvm_gmem_fallocate,
@@ -463,6 +544,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
 	u64 flags = args->flags;
 	u64 valid_flags = 0;
 
+	if (kvm_arch_vm_supports_gmem_shared_mem(kvm))
+		valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
+
 	if (flags & ~valid_flags)
 		return -EINVAL;
 
@@ -501,6 +585,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
 	    offset + size > i_size_read(inode))
 		goto err;
 
+	if (kvm_gmem_supports_shared(inode) &&
+	    !kvm_arch_vm_supports_gmem_shared_mem(kvm))
+		goto err;
+
 	filemap_invalidate_lock(inode->i_mapping);
 
 	start = offset >> PAGE_SHIFT;
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (6 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-13 20:30   ` James Houghton
  2025-05-14 17:39   ` David Hildenbrand
  2025-05-13 16:34 ` [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

From: Ackerley Tng <ackerleytng@google.com>

On binding of a guest_memfd with a memslot, check that the slot's
userspace_addr and the requested fd and offset refer to the same memory
range.

This check is best-effort: nothing prevents userspace from later mapping
other memory to the same provided in slot->userspace_addr and breaking
guest operation.

Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 virt/kvm/guest_memfd.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 8e6d1866b55e..2f499021df66 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
 	return __kvm_gmem_create(kvm, size, flags);
 }
 
+static bool kvm_gmem_is_same_range(struct kvm *kvm,
+				   struct kvm_memory_slot *slot,
+				   struct file *file, loff_t offset)
+{
+	struct mm_struct *mm = kvm->mm;
+	loff_t userspace_addr_offset;
+	struct vm_area_struct *vma;
+	bool ret = false;
+
+	mmap_read_lock(mm);
+
+	vma = vma_lookup(mm, slot->userspace_addr);
+	if (!vma)
+		goto out;
+
+	if (vma->vm_file != file)
+		goto out;
+
+	userspace_addr_offset = slot->userspace_addr - vma->vm_start;
+	ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset;
+out:
+	mmap_read_unlock(mm);
+
+	return ret;
+}
+
 int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
 		  unsigned int fd, loff_t offset)
 {
@@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
 	    offset + size > i_size_read(inode))
 		goto err;
 
-	if (kvm_gmem_supports_shared(inode) &&
-	    !kvm_arch_vm_supports_gmem_shared_mem(kvm))
-		goto err;
+	if (kvm_gmem_supports_shared(inode)) {
+		if (!kvm_arch_vm_supports_gmem_shared_mem(kvm))
+			goto err;
+
+		if (slot->userspace_addr &&
+		    !kvm_gmem_is_same_range(kvm, slot, file, offset))
+			goto err;
+	}
 
 	filemap_invalidate_lock(inode->i_mapping);
 
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (7 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-21  7:48   ` David Hildenbrand
  2025-05-13 16:34 ` [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd Fuad Tabba
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

From: Ackerley Tng <ackerleytng@google.com>

For memslots backed by guest_memfd with shared mem support, the KVM MMU
always faults-in pages from guest_memfd, and not from the userspace_addr.
Towards this end, this patch also introduces a new guest_memfd flag,
GUEST_MEMFD_FLAG_SUPPORT_SHARED, which indicates that the guest_memfd
instance supports in-place shared memory.

This flag is only supported if the VM creating the guest_memfd instance
belongs to certain types determined by architecture. Only non-CoCo VMs
are permitted to use guest_memfd with shared mem, for now.

Function names have also been updated for accuracy -
kvm_mem_is_private() returns true only when the current private/shared
state (in the CoCo sense) of the memory is private, and returns false if
the current state is shared explicitly or impicitly, e.g., belongs to a
non-CoCo VM.

kvm_mmu_faultin_pfn_gmem() is updated to indicate that it can be used
to fault in not just private memory, but more generally, from
guest_memfd.

Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 33 ++++++++++++++++++---------------
 include/linux/kvm_host.h | 33 +++++++++++++++++++++++++++++++--
 virt/kvm/guest_memfd.c   | 17 +++++++++++++++++
 3 files changed, 66 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2b6376986f96..cfbb471f7c70 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4465,21 +4465,25 @@ static inline u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
-static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
-					u8 max_level, int gmem_order)
+static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
+					    struct kvm_page_fault *fault,
+					    int order)
 {
-	u8 req_max_level;
+	u8 max_level = fault->max_level;
 
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
-	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
+	max_level = min(kvm_max_level_for_order(order), max_level);
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
-	req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn);
-	if (req_max_level)
-		max_level = min(max_level, req_max_level);
+	if (fault->is_private) {
+		u8 level = kvm_x86_call(private_max_mapping_level)(kvm, fault->pfn);
+
+		if (level)
+			max_level = min(max_level, level);
+	}
 
 	return max_level;
 }
@@ -4491,10 +4495,10 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 				 r == RET_PF_RETRY, fault->map_writable);
 }
 
-static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
-				       struct kvm_page_fault *fault)
+static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
+				    struct kvm_page_fault *fault)
 {
-	int max_order, r;
+	int gmem_order, r;
 
 	if (!kvm_slot_has_gmem(fault->slot)) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
@@ -4502,15 +4506,14 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	}
 
 	r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
-			     &fault->refcounted_page, &max_order);
+			     &fault->refcounted_page, &gmem_order);
 	if (r) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 		return r;
 	}
 
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
-	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
-							 fault->max_level, max_order);
+	fault->max_level = kvm_max_level_for_fault_and_order(vcpu->kvm, fault, gmem_order);
 
 	return RET_PF_CONTINUE;
 }
@@ -4520,8 +4523,8 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
 {
 	unsigned int foll = fault->write ? FOLL_WRITE : 0;
 
-	if (fault->is_private)
-		return kvm_mmu_faultin_pfn_private(vcpu, fault);
+	if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
+		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
 
 	foll |= FOLL_NOWAIT;
 	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2ec89c214978..de7b46ee1762 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2502,6 +2502,15 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
 		vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
 }
 
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot);
+#else
+static inline bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot)
+{
+	return false;
+}
+#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
+
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
 {
@@ -2515,10 +2524,30 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
 bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 					 struct kvm_gfn_range *range);
 
+/*
+ * Returns true if the given gfn's private/shared status (in the CoCo sense) is
+ * private.
+ *
+ * A return value of false indicates that the gfn is explicitly or implicity
+ * shared (i.e., non-CoCo VMs).
+ */
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
-	return IS_ENABLED(CONFIG_KVM_GMEM) &&
-	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
+	struct kvm_memory_slot *slot;
+
+	if (!IS_ENABLED(CONFIG_KVM_GMEM))
+		return false;
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
+		/*
+		 * For now, memslots only support in-place shared memory if the
+		 * host is allowed to mmap memory (i.e., non-Coco VMs).
+		 */
+		return false;
+	}
+
+	return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
 }
 #else
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 2f499021df66..fe0245335c96 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -388,6 +388,23 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
 
 	return 0;
 }
+
+bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot)
+{
+	struct file *file;
+	bool ret;
+
+	file = kvm_gmem_get_file((struct kvm_memory_slot *)slot);
+	if (!file)
+		return false;
+
+	ret = kvm_gmem_supports_shared(file_inode(file));
+
+	fput(file);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_gmem_memslot_supports_shared);
+
 #else
 #define kvm_gmem_mmap NULL
 #endif /* CONFIG_KVM_GMEM_SHARED_MEM */
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (8 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-14  7:13   ` Shivank Garg
                     ` (2 more replies)
  2025-05-13 16:34 ` [PATCH v9 11/17] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
                   ` (6 subsequent siblings)
  16 siblings, 3 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

From: Ackerley Tng <ackerleytng@google.com>

This patch adds kvm_gmem_max_mapping_level(), which always returns
PG_LEVEL_4K since guest_memfd only supports 4K pages for now.

When guest_memfd supports shared memory, max_mapping_level (especially
when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
from recover_huge_pages_range()) should take input from
guest_memfd.

Input from guest_memfd should be taken in these cases:

+ if the memslot supports shared memory (guest_memfd is used for
  shared memory, or in future both shared and private memory) or
+ if the memslot is only used for private memory and that gfn is
  private.

If the memslot doesn't use guest_memfd, figure out the
max_mapping_level using the host page tables like before.

This patch also refactors and inlines the other call to
__kvm_mmu_max_mapping_level().

In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
provided (if applicable) in fault->max_level. Hence, there is no need
to query guest_memfd.

lpage_info is queried like before, and then if the fault is not from
guest_memfd, adjust fault->req_level based on input from host page
tables.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 92 ++++++++++++++++++++++++++--------------
 include/linux/kvm_host.h |  7 +++
 virt/kvm/guest_memfd.c   | 12 ++++++
 3 files changed, 79 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cfbb471f7c70..9e0bc8114859 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return level;
 }
 
-static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
-				       const struct kvm_memory_slot *slot,
-				       gfn_t gfn, int max_level, bool is_private)
+static int kvm_lpage_info_max_mapping_level(struct kvm *kvm,
+					    const struct kvm_memory_slot *slot,
+					    gfn_t gfn, int max_level)
 {
 	struct kvm_lpage_info *linfo;
-	int host_level;
 
 	max_level = min(max_level, max_huge_page_level);
 	for ( ; max_level > PG_LEVEL_4K; max_level--) {
@@ -3270,23 +3269,61 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 			break;
 	}
 
-	if (is_private)
-		return max_level;
+	return max_level;
+}
+
+static inline u8 kvm_max_level_for_order(int order)
+{
+	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
+
+	KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
+			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
+			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
+		return PG_LEVEL_1G;
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+		return PG_LEVEL_2M;
+
+	return PG_LEVEL_4K;
+}
+
+static inline int kvm_gmem_max_mapping_level(const struct kvm_memory_slot *slot,
+					     gfn_t gfn, int max_level)
+{
+	int max_order;
 
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
-	host_level = host_pfn_mapping_level(kvm, gfn, slot);
-	return min(host_level, max_level);
+	max_order = kvm_gmem_mapping_order(slot, gfn);
+	return min(max_level, kvm_max_level_for_order(max_order));
 }
 
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	bool is_private = kvm_slot_has_gmem(slot) &&
-			  kvm_mem_is_private(kvm, gfn);
+	int max_level;
+
+	max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM);
+	if (max_level == PG_LEVEL_4K)
+		return PG_LEVEL_4K;
 
-	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
+	if (kvm_slot_has_gmem(slot) &&
+	    (kvm_gmem_memslot_supports_shared(slot) ||
+	     kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+		return kvm_gmem_max_mapping_level(slot, gfn, max_level);
+	}
+
+	return min(max_level, host_pfn_mapping_level(kvm, gfn, slot));
+}
+
+static inline bool fault_from_gmem(struct kvm_page_fault *fault)
+{
+	return fault->is_private ||
+	       (kvm_slot_has_gmem(fault->slot) &&
+		kvm_gmem_memslot_supports_shared(fault->slot));
 }
 
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
@@ -3309,12 +3346,20 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * Enforce the iTLB multihit workaround after capturing the requested
 	 * level, which will be used to do precise, accurate accounting.
 	 */
-	fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot,
-						       fault->gfn, fault->max_level,
-						       fault->is_private);
+	fault->req_level = kvm_lpage_info_max_mapping_level(vcpu->kvm, slot,
+							    fault->gfn, fault->max_level);
 	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
 		return;
 
+	if (!fault_from_gmem(fault)) {
+		int host_level;
+
+		host_level = host_pfn_mapping_level(vcpu->kvm, fault->gfn, slot);
+		fault->req_level = min(fault->req_level, host_level);
+		if (fault->req_level == PG_LEVEL_4K)
+			return;
+	}
+
 	/*
 	 * mmu_invalidate_retry() was successful and mmu_lock is held, so
 	 * the pmd can't be split from under us.
@@ -4448,23 +4493,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 		vcpu->stat.pf_fixed++;
 }
 
-static inline u8 kvm_max_level_for_order(int order)
-{
-	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
-
-	KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
-			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
-			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
-
-	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
-		return PG_LEVEL_1G;
-
-	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
-		return PG_LEVEL_2M;
-
-	return PG_LEVEL_4K;
-}
-
 static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
 					    struct kvm_page_fault *fault,
 					    int order)
@@ -4523,7 +4551,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
 {
 	unsigned int foll = fault->write ? FOLL_WRITE : 0;
 
-	if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
+	if (fault_from_gmem(fault))
 		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
 
 	foll |= FOLL_NOWAIT;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index de7b46ee1762..f9bb025327c3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
 		     int *max_order);
+int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
 #else
 static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn,
@@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
+static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
+					 gfn_t gfn)
+{
+	BUG();
+	return 0;
+}
 #endif /* CONFIG_KVM_GMEM */
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fe0245335c96..b8e247063b20 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -774,6 +774,18 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
+/**
+ * Returns the mapping order for this @gfn in @slot.
+ *
+ * This is equal to max_order that would be returned if kvm_gmem_get_pfn() were
+ * called now.
+ */
+int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_gmem_mapping_order);
+
 #ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque)
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 11/17] KVM: arm64: Refactor user_mem_abort() calculation of force_pte
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (9 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-13 16:34 ` [PATCH v9 12/17] KVM: arm64: Rename variables in user_mem_abort() Fuad Tabba
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

To simplify the code and to make the assumptions clearer,
refactor user_mem_abort() by immediately setting force_pte to
true if the conditions are met. Also, remove the comment about
logging_active being guaranteed to never be true for VM_PFNMAP
memslots, since it's not actually correct.

No functional change intended.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index eeda92330ade..9865ada04a81 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1472,7 +1472,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  bool fault_is_perm)
 {
 	int ret = 0;
-	bool write_fault, writable, force_pte = false;
+	bool write_fault, writable;
 	bool exec_fault, mte_allowed;
 	bool device = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
@@ -1484,6 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
+	bool force_pte = logging_active || is_protected_kvm_enabled();
 	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
@@ -1536,16 +1537,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 	}
 
-	/*
-	 * logging_active is guaranteed to never be true for VM_PFNMAP
-	 * memslots.
-	 */
-	if (logging_active || is_protected_kvm_enabled()) {
-		force_pte = true;
+	if (force_pte)
 		vma_shift = PAGE_SHIFT;
-	} else {
+	else
 		vma_shift = get_vma_page_shift(vma, hva);
-	}
 
 	switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 12/17] KVM: arm64: Rename variables in user_mem_abort()
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (10 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 11/17] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-21  2:25   ` Gavin Shan
  2025-05-21  8:02   ` David Hildenbrand
  2025-05-13 16:34 ` [PATCH v9 13/17] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
                   ` (4 subsequent siblings)
  16 siblings, 2 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Guest memory can be backed by guest_memfd or by anonymous memory. Rename
vma_shift to page_shift and vma_pagesize to page_size to ease
readability in subsequent patches.

Suggested-by: James Houghton <jthoughton@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 54 ++++++++++++++++++++++----------------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 9865ada04a81..d756c2b5913f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1479,13 +1479,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
 	struct vm_area_struct *vma;
-	short vma_shift;
+	short page_shift;
 	void *memcache;
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
 	bool force_pte = logging_active || is_protected_kvm_enabled();
-	long vma_pagesize, fault_granule;
+	long page_size, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 	struct page *page;
@@ -1538,11 +1538,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	if (force_pte)
-		vma_shift = PAGE_SHIFT;
+		page_shift = PAGE_SHIFT;
 	else
-		vma_shift = get_vma_page_shift(vma, hva);
+		page_shift = get_vma_page_shift(vma, hva);
 
-	switch (vma_shift) {
+	switch (page_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
 	case PUD_SHIFT:
 		if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
@@ -1550,23 +1550,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		fallthrough;
 #endif
 	case CONT_PMD_SHIFT:
-		vma_shift = PMD_SHIFT;
+		page_shift = PMD_SHIFT;
 		fallthrough;
 	case PMD_SHIFT:
 		if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
 			break;
 		fallthrough;
 	case CONT_PTE_SHIFT:
-		vma_shift = PAGE_SHIFT;
+		page_shift = PAGE_SHIFT;
 		force_pte = true;
 		fallthrough;
 	case PAGE_SHIFT:
 		break;
 	default:
-		WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
+		WARN_ONCE(1, "Unknown page_shift %d", page_shift);
 	}
 
-	vma_pagesize = 1UL << vma_shift;
+	page_size = 1UL << page_shift;
 
 	if (nested) {
 		unsigned long max_map_size;
@@ -1592,7 +1592,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			max_map_size = PAGE_SIZE;
 
 		force_pte = (max_map_size == PAGE_SIZE);
-		vma_pagesize = min(vma_pagesize, (long)max_map_size);
+		page_size = min_t(long, page_size, max_map_size);
 	}
 
 	/*
@@ -1600,9 +1600,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * ensure we find the right PFN and lay down the mapping in the right
 	 * place.
 	 */
-	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
-		fault_ipa &= ~(vma_pagesize - 1);
-		ipa &= ~(vma_pagesize - 1);
+	if (page_size == PMD_SIZE || page_size == PUD_SIZE) {
+		fault_ipa &= ~(page_size - 1);
+		ipa &= ~(page_size - 1);
 	}
 
 	gfn = ipa >> PAGE_SHIFT;
@@ -1627,7 +1627,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
 				&writable, &page);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
-		kvm_send_hwpoison_signal(hva, vma_shift);
+		kvm_send_hwpoison_signal(hva, page_shift);
 		return 0;
 	}
 	if (is_error_noslot_pfn(pfn))
@@ -1636,9 +1636,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (kvm_is_device_pfn(pfn)) {
 		/*
 		 * If the page was identified as device early by looking at
-		 * the VMA flags, vma_pagesize is already representing the
+		 * the VMA flags, page_size is already representing the
 		 * largest quantity we can map.  If instead it was mapped
-		 * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
+		 * via __kvm_faultin_pfn(), page_size is set to PAGE_SIZE
 		 * and must not be upgraded.
 		 *
 		 * In both cases, we don't let transparent_hugepage_adjust()
@@ -1686,16 +1686,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
-	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
+	if (page_size == PAGE_SIZE && !(force_pte || device)) {
 		if (fault_is_perm && fault_granule > PAGE_SIZE)
-			vma_pagesize = fault_granule;
+			page_size = fault_granule;
 		else
-			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
-								   hva, &pfn,
-								   &fault_ipa);
+			page_size = transparent_hugepage_adjust(kvm, memslot,
+								hva, &pfn,
+								&fault_ipa);
 
-		if (vma_pagesize < 0) {
-			ret = vma_pagesize;
+		if (page_size < 0) {
+			ret = page_size;
 			goto out_unlock;
 		}
 	}
@@ -1703,7 +1703,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (!fault_is_perm && !device && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
 		if (mte_allowed) {
-			sanitise_mte_tags(kvm, pfn, vma_pagesize);
+			sanitise_mte_tags(kvm, pfn, page_size);
 		} else {
 			ret = -EFAULT;
 			goto out_unlock;
@@ -1728,10 +1728,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	/*
 	 * Under the premise of getting a FSC_PERM fault, we just need to relax
-	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
+	 * permissions only if page_size equals fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault_is_perm && vma_pagesize == fault_granule) {
+	if (fault_is_perm && page_size == fault_granule) {
 		/*
 		 * Drop the SW bits in favour of those stored in the
 		 * PTE, which will be preserved.
@@ -1739,7 +1739,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		prot &= ~KVM_NV_GUEST_MAP_SZ;
 		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags);
 	} else {
-		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, vma_pagesize,
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, page_size,
 					     __pfn_to_phys(pfn), prot,
 					     memcache, flags);
 	}
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 13/17] KVM: arm64: Handle guest_memfd()-backed guest page faults
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (11 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 12/17] KVM: arm64: Rename variables in user_mem_abort() Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-14 21:26   ` James Houghton
  2025-05-21  8:04   ` David Hildenbrand
  2025-05-13 16:34 ` [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
                   ` (3 subsequent siblings)
  16 siblings, 2 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Add arm64 support for handling guest page faults on guest_memfd
backed memslots.

For now, the fault granule is restricted to PAGE_SIZE.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c     | 94 +++++++++++++++++++++++++---------------
 include/linux/kvm_host.h |  5 +++
 virt/kvm/kvm_main.c      |  5 ---
 3 files changed, 64 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d756c2b5913f..9a48ef08491d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1466,6 +1466,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
+static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			     gfn_t gfn, bool write_fault, bool *writable,
+			     struct page **page, bool is_gmem)
+{
+	kvm_pfn_t pfn;
+	int ret;
+
+	if (!is_gmem)
+		return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
+
+	*writable = false;
+
+	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
+	if (!ret) {
+		*writable = !memslot_is_readonly(slot);
+		return pfn;
+	}
+
+	if (ret == -EHWPOISON)
+		return KVM_PFN_ERR_HWPOISON;
+
+	return KVM_PFN_ERR_NOSLOT_MASK;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
@@ -1473,19 +1497,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 {
 	int ret = 0;
 	bool write_fault, writable;
-	bool exec_fault, mte_allowed;
+	bool exec_fault, mte_allowed = false;
 	bool device = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
 	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
-	struct vm_area_struct *vma;
-	short page_shift;
+	struct vm_area_struct *vma = NULL;
+	short page_shift = PAGE_SHIFT;
 	void *memcache;
-	gfn_t gfn;
+	gfn_t gfn = ipa >> PAGE_SHIFT;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	bool force_pte = logging_active || is_protected_kvm_enabled();
-	long page_size, fault_granule;
+	bool is_gmem = kvm_slot_has_gmem(memslot);
+	bool force_pte = logging_active || is_gmem || is_protected_kvm_enabled();
+	long page_size, fault_granule = PAGE_SIZE;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 	struct page *page;
@@ -1529,17 +1554,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * Let's check if we will get back a huge page backed by hugetlbfs, or
 	 * get block mapping for device MMIO region.
 	 */
-	mmap_read_lock(current->mm);
-	vma = vma_lookup(current->mm, hva);
-	if (unlikely(!vma)) {
-		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
-		mmap_read_unlock(current->mm);
-		return -EFAULT;
+	if (!is_gmem) {
+		mmap_read_lock(current->mm);
+		vma = vma_lookup(current->mm, hva);
+		if (unlikely(!vma)) {
+			kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
+			mmap_read_unlock(current->mm);
+			return -EFAULT;
+		}
+
+		vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
+		mte_allowed = kvm_vma_mte_allowed(vma);
 	}
 
-	if (force_pte)
-		page_shift = PAGE_SHIFT;
-	else
+	if (!force_pte)
 		page_shift = get_vma_page_shift(vma, hva);
 
 	switch (page_shift) {
@@ -1605,27 +1633,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ipa &= ~(page_size - 1);
 	}
 
-	gfn = ipa >> PAGE_SHIFT;
-	mte_allowed = kvm_vma_mte_allowed(vma);
-
-	vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
-
-	/* Don't use the VMA after the unlock -- it may have vanished */
-	vma = NULL;
+	if (!is_gmem) {
+		/* Don't use the VMA after the unlock -- it may have vanished */
+		vma = NULL;
 
-	/*
-	 * Read mmu_invalidate_seq so that KVM can detect if the results of
-	 * vma_lookup() or __kvm_faultin_pfn() become stale prior to
-	 * acquiring kvm->mmu_lock.
-	 *
-	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
-	 * with the smp_wmb() in kvm_mmu_invalidate_end().
-	 */
-	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
-	mmap_read_unlock(current->mm);
+		/*
+		 * Read mmu_invalidate_seq so that KVM can detect if the results
+		 * of vma_lookup() or faultin_pfn() become stale prior to
+		 * acquiring kvm->mmu_lock.
+		 *
+		 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which
+		 * pairs with the smp_wmb() in kvm_mmu_invalidate_end().
+		 */
+		mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+		mmap_read_unlock(current->mm);
+	}
 
-	pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
-				&writable, &page);
+	pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, page_shift);
 		return 0;
@@ -1677,7 +1701,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	kvm_fault_lock(kvm);
 	pgt = vcpu->arch.hw_mmu->pgt;
-	if (mmu_invalidate_retry(kvm, mmu_seq)) {
+	if (!is_gmem && mmu_invalidate_retry(kvm, mmu_seq)) {
 		ret = -EAGAIN;
 		goto out_unlock;
 	}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f9bb025327c3..b317392453a5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1884,6 +1884,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
 	return gfn_to_memslot(kvm, gfn)->id;
 }
 
+static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot)
+{
+	return slot->flags & KVM_MEM_READONLY;
+}
+
 static inline gfn_t
 hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6289ea1685dd..6261d8638cd2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2640,11 +2640,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 	return size;
 }
 
-static bool memslot_is_readonly(const struct kvm_memory_slot *slot)
-{
-	return slot->flags & KVM_MEM_READONLY;
-}
-
 static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn,
 				       gfn_t *nr_pages, bool write)
 {
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (12 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 13/17] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-15 23:50   ` James Houghton
  2025-05-21  8:05   ` David Hildenbrand
  2025-05-13 16:34 ` [PATCH v9 15/17] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
                   ` (2 subsequent siblings)
  16 siblings, 2 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Enable mapping guest_memfd in arm64. For now, it applies to all
VMs in arm64 that use guest_memfd. In the future, new VM types
can restrict this via kvm_arch_gmem_supports_shared_mem().

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
 arch/arm64/kvm/Kconfig            |  1 +
 2 files changed, 11 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 08ba91e6fb03..2514779f5131 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
 	return true;
 }
 
+static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
+{
+	return IS_ENABLED(CONFIG_KVM_GMEM);
+}
+
+static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
+{
+	return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
+}
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 096e45acadb2..8c1e1964b46a 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -38,6 +38,7 @@ menuconfig KVM
 	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
+	select KVM_GMEM_SHARED_MEM
 	help
 	  Support hosting virtualized guest machines.
 
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 15/17] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (13 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-21  2:46   ` Gavin Shan
  2025-05-21  8:06   ` David Hildenbrand
  2025-05-13 16:34 ` [PATCH v9 16/17] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
  2025-05-13 16:34 ` [PATCH v9 17/17] KVM: selftests: Test guest_memfd same-range validation Fuad Tabba
  16 siblings, 2 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

This patch introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
indicates that guest_memfd supports shared memory (when enabled by the
flag). This support is limited to certain VM types, determined per
architecture.

This patch also updates the KVM documentation with details on the new
capability, flag, and other information about support for shared memory
in guest_memfd.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 Documentation/virt/kvm/api.rst | 18 ++++++++++++++++++
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/kvm_main.c            |  4 ++++
 3 files changed, 23 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 47c7c3f92314..86f74ce7f12a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6390,6 +6390,24 @@ most one mapping per page, i.e. binding multiple memory regions to a single
 guest_memfd range is not allowed (any number of memory regions can be bound to
 a single guest_memfd file, but the bound ranges must not overlap).
 
+When the capability KVM_CAP_GMEM_SHARED_MEM is supported, the 'flags' field
+supports GUEST_MEMFD_FLAG_SUPPORT_SHARED.  Setting this flag on guest_memfd
+creation enables mmap() and faulting of guest_memfd memory to host userspace.
+
+When the KVM MMU performs a PFN lookup to service a guest fault and the backing
+guest_memfd has the GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then the fault will
+always be consumed from guest_memfd, regardless of whether it is a shared or a
+private fault.
+
+For these memslots, userspace_addr is checked to be the mmap()-ed view of the
+same range specified using gmem.pgoff.  Other accesses by KVM, e.g., instruction
+emulation, go via slot->userspace_addr.  The slot->userspace_addr field can be
+set to 0 to skip this check, which indicates that KVM would not access memory
+belonging to the slot via its userspace_addr.
+
+The use of GUEST_MEMFD_FLAG_SUPPORT_SHARED will not be allowed for CoCo VMs.
+This is validated when the guest_memfd instance is bound to the VM.
+
 See KVM_SET_USER_MEMORY_REGION2 for additional details.
 
 4.143 KVM_PRE_FAULT_MEMORY
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9857022a0f0c..4cc824a3a7c9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -930,6 +930,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
 #define KVM_CAP_X86_GUEST_MODE 238
 #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
+#define KVM_CAP_GMEM_SHARED_MEM 240
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6261d8638cd2..6c75f933bfbe 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4840,6 +4840,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_supports_gmem(kvm);
+#endif
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+	case KVM_CAP_GMEM_SHARED_MEM:
+		return !kvm || kvm_arch_vm_supports_gmem_shared_mem(kvm);
 #endif
 	default:
 		break;
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 16/17] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (14 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 15/17] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  2025-05-21  6:53   ` Gavin Shan
  2025-05-13 16:34 ` [PATCH v9 17/17] KVM: selftests: Test guest_memfd same-range validation Fuad Tabba
  16 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Expand the guest_memfd selftests to include testing mapping guest
memory for VM types that support it.

Also, build the guest_memfd selftest for arm64.

Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  | 145 +++++++++++++++---
 2 files changed, 126 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index f62b0a5aba35..ccf95ed037c3 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -163,6 +163,7 @@ TEST_GEN_PROGS_arm64 += access_tracking_perf_test
 TEST_GEN_PROGS_arm64 += arch_timer
 TEST_GEN_PROGS_arm64 += coalesced_io_test
 TEST_GEN_PROGS_arm64 += dirty_log_perf_test
+TEST_GEN_PROGS_arm64 += guest_memfd_test
 TEST_GEN_PROGS_arm64 += get-reg-list
 TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
 TEST_GEN_PROGS_arm64 += memslot_perf_test
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index ce687f8d248f..443c49185543 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -34,12 +34,46 @@ static void test_file_read_write(int fd)
 		    "pwrite on a guest_mem fd should fail");
 }
 
-static void test_mmap(int fd, size_t page_size)
+static void test_mmap_allowed(int fd, size_t page_size, size_t total_size)
+{
+	const char val = 0xaa;
+	char *mem;
+	size_t i;
+	int ret;
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass.");
+
+	memset(mem, val, total_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
+			page_size);
+	TEST_ASSERT(!ret, "fallocate the first page should succeed");
+
+	for (i = 0; i < page_size; i++)
+		TEST_ASSERT_EQ(mem[i], 0x00);
+	for (; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	memset(mem, val, total_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	ret = munmap(mem, total_size);
+	TEST_ASSERT(!ret, "munmap should succeed");
+}
+
+static void test_mmap_denied(int fd, size_t page_size, size_t total_size)
 {
 	char *mem;
 
 	mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 	TEST_ASSERT_EQ(mem, MAP_FAILED);
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT_EQ(mem, MAP_FAILED);
 }
 
 static void test_file_size(int fd, size_t page_size, size_t total_size)
@@ -120,26 +154,19 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
 	}
 }
 
-static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
+static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
+						  uint64_t guest_memfd_flags,
+						  size_t page_size)
 {
-	size_t page_size = getpagesize();
-	uint64_t flag;
 	size_t size;
 	int fd;
 
 	for (size = 1; size < page_size; size++) {
-		fd = __vm_create_guest_memfd(vm, size, 0);
+		fd = __vm_create_guest_memfd(vm, size, guest_memfd_flags);
 		TEST_ASSERT(fd == -1 && errno == EINVAL,
 			    "guest_memfd() with non-page-aligned page size '0x%lx' should fail with EINVAL",
 			    size);
 	}
-
-	for (flag = BIT(0); flag; flag <<= 1) {
-		fd = __vm_create_guest_memfd(vm, page_size, flag);
-		TEST_ASSERT(fd == -1 && errno == EINVAL,
-			    "guest_memfd() with flag '0x%lx' should fail with EINVAL",
-			    flag);
-	}
 }
 
 static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
@@ -170,30 +197,108 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 	close(fd1);
 }
 
-int main(int argc, char *argv[])
+static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
+			   bool expect_mmap_allowed)
 {
-	size_t page_size;
+	struct kvm_vm *vm;
 	size_t total_size;
+	size_t page_size;
 	int fd;
-	struct kvm_vm *vm;
 
-	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+	if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type)))
+		return;
 
 	page_size = getpagesize();
 	total_size = page_size * 4;
 
-	vm = vm_create_barebones();
+	vm = vm_create_barebones_type(vm_type);
 
-	test_create_guest_memfd_invalid(vm);
 	test_create_guest_memfd_multiple(vm);
+	test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size);
 
-	fd = vm_create_guest_memfd(vm, total_size, 0);
+	fd = vm_create_guest_memfd(vm, total_size, guest_memfd_flags);
 
 	test_file_read_write(fd);
-	test_mmap(fd, page_size);
+
+	if (expect_mmap_allowed)
+		test_mmap_allowed(fd, page_size, total_size);
+	else
+		test_mmap_denied(fd, page_size, total_size);
+
 	test_file_size(fd, page_size, total_size);
 	test_fallocate(fd, page_size, total_size);
 	test_invalid_punch_hole(fd, page_size, total_size);
 
 	close(fd);
+	kvm_vm_release(vm);
+}
+
+static void test_vm_type_gmem_flag_validity(unsigned long vm_type,
+					    uint64_t expected_valid_flags)
+{
+	size_t page_size = getpagesize();
+	struct kvm_vm *vm;
+	uint64_t flag = 0;
+	int fd;
+
+	if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type)))
+		return;
+
+	vm = vm_create_barebones_type(vm_type);
+
+	for (flag = BIT(0); flag; flag <<= 1) {
+		fd = __vm_create_guest_memfd(vm, page_size, flag);
+
+		if (flag & expected_valid_flags) {
+			TEST_ASSERT(fd > 0,
+				    "guest_memfd() with flag '0x%lx' should be valid",
+				    flag);
+			close(fd);
+		} else {
+			TEST_ASSERT(fd == -1 && errno == EINVAL,
+				    "guest_memfd() with flag '0x%lx' should fail with EINVAL",
+				    flag);
+		}
+	}
+
+	kvm_vm_release(vm);
+}
+
+static void test_gmem_flag_validity(void)
+{
+	uint64_t non_coco_vm_valid_flags = 0;
+
+	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
+		non_coco_vm_valid_flags = GUEST_MEMFD_FLAG_SUPPORT_SHARED;
+
+	test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, non_coco_vm_valid_flags);
+
+#ifdef __x86_64__
+	test_vm_type_gmem_flag_validity(KVM_X86_SW_PROTECTED_VM, non_coco_vm_valid_flags);
+	test_vm_type_gmem_flag_validity(KVM_X86_SEV_VM, 0);
+	test_vm_type_gmem_flag_validity(KVM_X86_SEV_ES_VM, 0);
+	test_vm_type_gmem_flag_validity(KVM_X86_SNP_VM, 0);
+	test_vm_type_gmem_flag_validity(KVM_X86_TDX_VM, 0);
+#endif
+}
+
+int main(int argc, char *argv[])
+{
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+
+	test_gmem_flag_validity();
+
+	test_with_type(VM_TYPE_DEFAULT, 0, false);
+	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
+		test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_SUPPORT_SHARED,
+			       true);
+	}
+
+#ifdef __x86_64__
+	test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
+	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
+		test_with_type(KVM_X86_SW_PROTECTED_VM,
+			       GUEST_MEMFD_FLAG_SUPPORT_SHARED, true);
+	}
+#endif
 }
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v9 17/17] KVM: selftests: Test guest_memfd same-range validation
  2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (15 preceding siblings ...)
  2025-05-13 16:34 ` [PATCH v9 16/17] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
@ 2025-05-13 16:34 ` Fuad Tabba
  16 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-13 16:34 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

From: Ackerley Tng <ackerleytng@google.com>

Add some selftests for guest_memfd same-range validation, which check
that the slot userspace_addr covers the same range as the memory in
guest_memfd:

+ When slot->userspace_addr is set to 0, there should be no range
  match validation on guest_memfd binding.
+ guest_memfd binding should fail if
    + slot->userspace_addr is not from guest_memfd
    + slot->userspace_addr is mmap()ed from some other file
    + slot->userspace_addr is mmap()ed from some other guest_memfd
    + slot->userspace_addr is mmap()ed from a different range in the
      same guest_memfd
+ guest_memfd binding should succeed if slot->userspace_addr is
  mmap()ed from the same range in the same guest_memfd provided in
  slot->guest_memfd

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 168 ++++++++++++++++++
 1 file changed, 168 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 443c49185543..60aaba5808a5 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -197,6 +197,173 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 	close(fd1);
 }
 
+#define GUEST_MEMFD_TEST_SLOT 10
+#define GUEST_MEMFD_TEST_GPA 0x100000000
+
+static void
+test_bind_guest_memfd_disabling_range_match_validation(struct kvm_vm *vm,
+						       int fd)
+{
+	size_t page_size = getpagesize();
+	int ret;
+
+	ret = __vm_set_user_memory_region2(vm, GUEST_MEMFD_TEST_SLOT,
+					   KVM_MEM_GUEST_MEMFD,
+					   GUEST_MEMFD_TEST_GPA, page_size, 0,
+					   fd, 0);
+	TEST_ASSERT(!ret,
+		    "setting slot->userspace_addr to 0 should disable validation");
+	ret = __vm_set_user_memory_region2(vm, GUEST_MEMFD_TEST_SLOT,
+					   KVM_MEM_GUEST_MEMFD,
+					   GUEST_MEMFD_TEST_GPA, 0, 0,
+					   fd, 0);
+	TEST_ASSERT(!ret, "Deleting memslot should work");
+}
+
+static void
+test_bind_guest_memfd_anon_memory_in_userspace_addr(struct kvm_vm *vm, int fd)
+{
+	size_t page_size = getpagesize();
+	void *userspace_addr;
+	int ret;
+
+	userspace_addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
+			      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+
+	ret = __vm_set_user_memory_region2(vm, GUEST_MEMFD_TEST_SLOT,
+					   KVM_MEM_GUEST_MEMFD,
+					   GUEST_MEMFD_TEST_GPA, page_size,
+					   userspace_addr, fd, 0);
+	TEST_ASSERT(ret == -1,
+		    "slot->userspace_addr is not from the guest_memfd and should fail");
+}
+
+static void test_bind_guest_memfd_shared_memory_other_file_in_userspace_addr(
+	struct kvm_vm *vm, int fd)
+{
+	size_t page_size = getpagesize();
+	void *userspace_addr;
+	int other_fd;
+	int ret;
+
+	other_fd = memfd_create("shared_memory_other_file", 0);
+	TEST_ASSERT(other_fd > 0, "Creating other file should succeed");
+
+	userspace_addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
+			      MAP_SHARED, other_fd, 0);
+
+	ret = __vm_set_user_memory_region2(vm, GUEST_MEMFD_TEST_SLOT,
+					   KVM_MEM_GUEST_MEMFD,
+					   GUEST_MEMFD_TEST_GPA, page_size,
+					   userspace_addr, fd, 0);
+	TEST_ASSERT(ret == -1,
+		    "slot->userspace_addr is not from the guest_memfd and should fail");
+
+	TEST_ASSERT(!munmap(userspace_addr, page_size),
+		    "munmap() to cleanup should succeed");
+
+	close(other_fd);
+}
+
+static void
+test_bind_guest_memfd_other_guest_memfd_in_userspace_addr(struct kvm_vm *vm,
+							  int fd)
+{
+	size_t page_size = getpagesize();
+	void *userspace_addr;
+	int other_fd;
+	int ret;
+
+	other_fd = vm_create_guest_memfd(vm, page_size * 2,
+					 GUEST_MEMFD_FLAG_SUPPORT_SHARED);
+	TEST_ASSERT(other_fd > 0, "Creating other file should succeed");
+
+	userspace_addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
+			      MAP_SHARED, other_fd, 0);
+
+	ret = __vm_set_user_memory_region2(vm, GUEST_MEMFD_TEST_SLOT,
+					   KVM_MEM_GUEST_MEMFD,
+					   GUEST_MEMFD_TEST_GPA, page_size,
+					   userspace_addr, fd, 0);
+	TEST_ASSERT(ret == -1,
+		    "slot->userspace_addr is not from the guest_memfd and should fail");
+
+	TEST_ASSERT(!munmap(userspace_addr, page_size),
+		    "munmap() to cleanup should succeed");
+
+	close(other_fd);
+}
+
+static void
+test_bind_guest_memfd_other_range_in_userspace_addr(struct kvm_vm *vm, int fd)
+{
+	size_t page_size = getpagesize();
+	void *userspace_addr;
+	int ret;
+
+	userspace_addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
+			      MAP_SHARED, fd, page_size);
+
+	ret = __vm_set_user_memory_region2(vm, GUEST_MEMFD_TEST_SLOT,
+					   KVM_MEM_GUEST_MEMFD,
+					   GUEST_MEMFD_TEST_GPA, page_size,
+					   userspace_addr, fd, 0);
+	TEST_ASSERT(ret == -1,
+		    "slot->userspace_addr is not from the same range and should fail");
+
+	TEST_ASSERT(!munmap(userspace_addr, page_size),
+		    "munmap() to cleanup should succeed");
+}
+
+static void
+test_bind_guest_memfd_same_range_in_userspace_addr(struct kvm_vm *vm, int fd)
+{
+	size_t page_size = getpagesize();
+	void *userspace_addr;
+	int ret;
+
+	userspace_addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
+			      MAP_SHARED, fd, page_size);
+
+	ret = __vm_set_user_memory_region2(vm, GUEST_MEMFD_TEST_SLOT,
+					   KVM_MEM_GUEST_MEMFD,
+					   GUEST_MEMFD_TEST_GPA, page_size,
+					   userspace_addr, fd, page_size);
+	TEST_ASSERT(!ret,
+		    "slot->userspace_addr is the same range and should succeed");
+
+	TEST_ASSERT(!munmap(userspace_addr, page_size),
+		    "munmap() to cleanup should succeed");
+
+	ret = __vm_set_user_memory_region2(vm, GUEST_MEMFD_TEST_SLOT,
+					   KVM_MEM_GUEST_MEMFD,
+					   GUEST_MEMFD_TEST_GPA, 0, 0,
+					   fd, 0);
+	TEST_ASSERT(!ret, "Deleting memslot should work");
+}
+
+static void test_bind_guest_memfd_wrt_userspace_addr(struct kvm_vm *vm)
+{
+	size_t page_size = getpagesize();
+	int fd;
+
+	if (!vm_check_cap(vm, KVM_CAP_GUEST_MEMFD) ||
+	    !vm_check_cap(vm, KVM_CAP_GMEM_SHARED_MEM))
+		return;
+
+	fd = vm_create_guest_memfd(vm, page_size * 2,
+				   GUEST_MEMFD_FLAG_SUPPORT_SHARED);
+
+	test_bind_guest_memfd_disabling_range_match_validation(vm, fd);
+	test_bind_guest_memfd_anon_memory_in_userspace_addr(vm, fd);
+	test_bind_guest_memfd_shared_memory_other_file_in_userspace_addr(vm, fd);
+	test_bind_guest_memfd_other_guest_memfd_in_userspace_addr(vm, fd);
+	test_bind_guest_memfd_other_range_in_userspace_addr(vm, fd);
+	test_bind_guest_memfd_same_range_in_userspace_addr(vm, fd);
+
+	close(fd);
+}
+
 static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
 			   bool expect_mmap_allowed)
 {
@@ -214,6 +381,7 @@ static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
 	vm = vm_create_barebones_type(vm_type);
 
 	test_create_guest_memfd_multiple(vm);
+	test_bind_guest_memfd_wrt_userspace_addr(vm);
 	test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size);
 
 	fd = vm_create_guest_memfd(vm, total_size, guest_memfd_flags);
-- 
2.49.0.1045.g170613ef41-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-13 16:34 ` [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
@ 2025-05-13 18:37   ` Ackerley Tng
  2025-05-16 19:21     ` James Houghton
  2025-05-14  8:03   ` Shivank Garg
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 88+ messages in thread
From: Ackerley Tng @ 2025-05-13 18:37 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny, tabba

Fuad Tabba <tabba@google.com> writes:

> This patch enables support for shared memory in guest_memfd, including
> mapping that memory at the host userspace. This support is gated by the
> configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> guest_memfd instance.
>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 10 ++++
>  include/linux/kvm_host.h        | 13 +++++
>  include/uapi/linux/kvm.h        |  1 +
>  virt/kvm/Kconfig                |  5 ++
>  virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
>  5 files changed, 117 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 709cc2a7ba66..f72722949cae 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>  
>  #ifdef CONFIG_KVM_GMEM
>  #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> +
> +/*
> + * CoCo VMs with hardware support that use guest_memfd only for backing private
> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> + */
> +#define kvm_arch_vm_supports_gmem_shared_mem(kvm)			\
> +	(IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&			\
> +	 ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||		\
> +	  (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
>  #else
>  #define kvm_arch_supports_gmem(kvm) false
> +#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false
>  #endif
>  
>  #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ae70e4e19700..2ec89c214978 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>  }
>  #endif
>  
> +/*
> + * Returns true if this VM supports shared mem in guest_memfd.
> + *
> + * Arch code must define kvm_arch_vm_supports_gmem_shared_mem if support for
> + * guest_memfd is enabled.
> + */
> +#if !defined(kvm_arch_vm_supports_gmem_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> +{
> +	return false;
> +}
> +#endif
> +
>  #ifndef kvm_arch_has_readonly_mem
>  static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
>  {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b6ae8ad8934b..9857022a0f0c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>  
>  #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED	(1UL << 0)
>  
>  struct kvm_create_guest_memfd {
>  	__u64 size;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 559c93ad90be..f4e469a62a60 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
>  config HAVE_KVM_ARCH_GMEM_INVALIDATE
>         bool
>         depends on KVM_GMEM
> +
> +config KVM_GMEM_SHARED_MEM
> +       select KVM_GMEM
> +       bool
> +       prompt "Enables in-place shared memory for guest_memfd"
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 6db515833f61..8e6d1866b55e 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>  	return gfn - slot->base_gfn + slot->gmem.pgoff;
>  }
>  
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +
> +static bool kvm_gmem_supports_shared(struct inode *inode)
> +{
> +	uint64_t flags = (uint64_t)inode->i_private;
> +
> +	return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +}
> +
> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> +{
> +	struct inode *inode = file_inode(vmf->vma->vm_file);
> +	struct folio *folio;
> +	vm_fault_t ret = VM_FAULT_LOCKED;
> +
> +	filemap_invalidate_lock_shared(inode->i_mapping);
> +
> +	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> +	if (IS_ERR(folio)) {
> +		int err = PTR_ERR(folio);
> +
> +		if (err == -EAGAIN)
> +			ret = VM_FAULT_RETRY;
> +		else
> +			ret = vmf_error(err);
> +
> +		goto out_filemap;
> +	}
> +
> +	if (folio_test_hwpoison(folio)) {
> +		ret = VM_FAULT_HWPOISON;
> +		goto out_folio;
> +	}
> +
> +	if (WARN_ON_ONCE(folio_test_large(folio))) {
> +		ret = VM_FAULT_SIGBUS;
> +		goto out_folio;
> +	}
> +
> +	if (!folio_test_uptodate(folio)) {
> +		clear_highpage(folio_page(folio, 0));
> +		kvm_gmem_mark_prepared(folio);
> +	}
> +
> +	vmf->page = folio_file_page(folio, vmf->pgoff);
> +
> +out_folio:
> +	if (ret != VM_FAULT_LOCKED) {
> +		folio_unlock(folio);
> +		folio_put(folio);
> +	}
> +
> +out_filemap:
> +	filemap_invalidate_unlock_shared(inode->i_mapping);

Do we need to hold the filemap_invalidate_lock while zeroing? Would
holding the folio lock be enough?

> +
> +	return ret;
> +}
> +
> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> +	.fault = kvm_gmem_fault_shared,
> +};
> +
> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	if (!kvm_gmem_supports_shared(file_inode(file)))
> +		return -ENODEV;
> +
> +	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> +	    (VM_SHARED | VM_MAYSHARE)) {
> +		return -EINVAL;
> +	}
> +
> +	vma->vm_ops = &kvm_gmem_vm_ops;
> +
> +	return 0;
> +}
> +#else
> +#define kvm_gmem_mmap NULL
> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> +
>  static struct file_operations kvm_gmem_fops = {
> +	.mmap		= kvm_gmem_mmap,
>  	.open		= generic_file_open,
>  	.release	= kvm_gmem_release,
>  	.fallocate	= kvm_gmem_fallocate,
> @@ -463,6 +544,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>  	u64 flags = args->flags;
>  	u64 valid_flags = 0;
>  
> +	if (kvm_arch_vm_supports_gmem_shared_mem(kvm))
> +		valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +
>  	if (flags & ~valid_flags)
>  		return -EINVAL;
>  
> @@ -501,6 +585,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
>  	    offset + size > i_size_read(inode))
>  		goto err;
>  
> +	if (kvm_gmem_supports_shared(inode) &&
> +	    !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> +		goto err;
> +
>  	filemap_invalidate_lock(inode->i_mapping);
>  
>  	start = offset >> PAGE_SHIFT;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range
  2025-05-13 16:34 ` [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range Fuad Tabba
@ 2025-05-13 20:30   ` James Houghton
  2025-05-14  7:33     ` Fuad Tabba
  2025-05-14 17:39   ` David Hildenbrand
  1 sibling, 1 reply; 88+ messages in thread
From: James Houghton @ 2025-05-13 20:30 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

On Tue, May 13, 2025 at 9:34 AM Fuad Tabba <tabba@google.com> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> On binding of a guest_memfd with a memslot, check that the slot's
> userspace_addr and the requested fd and offset refer to the same memory
> range.
>
> This check is best-effort: nothing prevents userspace from later mapping
> other memory to the same provided in slot->userspace_addr and breaking
> guest operation.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  virt/kvm/guest_memfd.c | 37 ++++++++++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 8e6d1866b55e..2f499021df66 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>         return __kvm_gmem_create(kvm, size, flags);
>  }
>
> +static bool kvm_gmem_is_same_range(struct kvm *kvm,
> +                                  struct kvm_memory_slot *slot,
> +                                  struct file *file, loff_t offset)
> +{
> +       struct mm_struct *mm = kvm->mm;
> +       loff_t userspace_addr_offset;
> +       struct vm_area_struct *vma;
> +       bool ret = false;
> +
> +       mmap_read_lock(mm);
> +
> +       vma = vma_lookup(mm, slot->userspace_addr);
> +       if (!vma)
> +               goto out;
> +
> +       if (vma->vm_file != file)
> +               goto out;
> +
> +       userspace_addr_offset = slot->userspace_addr - vma->vm_start;
> +       ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset;
> +out:
> +       mmap_read_unlock(mm);
> +
> +       return ret;
> +}
> +
>  int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
>                   unsigned int fd, loff_t offset)
>  {
> @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
>             offset + size > i_size_read(inode))
>                 goto err;
>
> -       if (kvm_gmem_supports_shared(inode) &&
> -           !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> -               goto err;
> +       if (kvm_gmem_supports_shared(inode)) {
> +               if (!kvm_arch_vm_supports_gmem_shared_mem(kvm))
> +                       goto err;
> +
> +               if (slot->userspace_addr &&
> +                   !kvm_gmem_is_same_range(kvm, slot, file, offset))
> +                       goto err;

This is very nit-picky, but I would rather this not be -EINVAL, maybe
-EIO instead? Or maybe a pr_warn_once() and let the call proceed?

The userspace_addr we got isn't invalid per se, we're just trying to
give a hint to the user that their VMAs (or the userspace address they
gave us) are messed up. I don't really like lumping this in with truly
invalid arguments.

> +       }
>
>         filemap_invalidate_lock(inode->i_mapping);
>
> --
> 2.49.0.1045.g170613ef41-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 02/17] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-05-13 16:34 ` [PATCH v9 02/17] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
@ 2025-05-13 21:56   ` Ira Weiny
  2025-05-21  7:14   ` Gavin Shan
  1 sibling, 0 replies; 88+ messages in thread
From: Ira Weiny @ 2025-05-13 21:56 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Fuad Tabba wrote:
> The option KVM_GENERIC_PRIVATE_MEM enables populating a GPA range with
> guest data. Rename it to KVM_GENERIC_GMEM_POPULATE to make its purpose
> clearer.
> 
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-13 16:34 ` [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd Fuad Tabba
@ 2025-05-14  7:13   ` Shivank Garg
  2025-05-14  7:24     ` Fuad Tabba
  2025-05-14 15:27   ` kernel test robot
  2025-05-21  8:01   ` David Hildenbrand
  2 siblings, 1 reply; 88+ messages in thread
From: Shivank Garg @ 2025-05-14  7:13 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 5/13/2025 10:04 PM, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> This patch adds kvm_gmem_max_mapping_level(), which always returns
> PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
> 
> When guest_memfd supports shared memory, max_mapping_level (especially
> when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
> from recover_huge_pages_range()) should take input from
> guest_memfd.
> 
> Input from guest_memfd should be taken in these cases:
> 
> + if the memslot supports shared memory (guest_memfd is used for
>   shared memory, or in future both shared and private memory) or
> + if the memslot is only used for private memory and that gfn is
>   private.
> 
> If the memslot doesn't use guest_memfd, figure out the
> max_mapping_level using the host page tables like before.
> 
> This patch also refactors and inlines the other call to
> __kvm_mmu_max_mapping_level().
> 
> In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
> provided (if applicable) in fault->max_level. Hence, there is no need
> to query guest_memfd.
> 
> lpage_info is queried like before, and then if the fault is not from
> guest_memfd, adjust fault->req_level based on input from host page
> tables.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c   | 92 ++++++++++++++++++++++++++--------------
>  include/linux/kvm_host.h |  7 +++
>  virt/kvm/guest_memfd.c   | 12 ++++++
>  3 files changed, 79 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index cfbb471f7c70..9e0bc8114859 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
>  	return level;
>  }
>  
> -static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
> -				       const struct kvm_memory_slot *slot,
> -				       gfn_t gfn, int max_level, bool is_private)
> +static int kvm_lpage_info_max_mapping_level(struct kvm *kvm,
> +					    const struct kvm_memory_slot *slot,
> +					    gfn_t gfn, int max_level)
>  {
>  	struct kvm_lpage_info *linfo;
> -	int host_level;
>  
>  	max_level = min(max_level, max_huge_page_level);
>  	for ( ; max_level > PG_LEVEL_4K; max_level--) {
> @@ -3270,23 +3269,61 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
>  			break;
>  	}
>  
> -	if (is_private)
> -		return max_level;
> +	return max_level;
> +}
> +
> +static inline u8 kvm_max_level_for_order(int order)
> +{
> +	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
> +
> +	KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
> +			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
> +			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
> +
> +	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
> +		return PG_LEVEL_1G;
> +
> +	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
> +		return PG_LEVEL_2M;
> +
> +	return PG_LEVEL_4K;
> +}
> +
> +static inline int kvm_gmem_max_mapping_level(const struct kvm_memory_slot *slot,
> +					     gfn_t gfn, int max_level)
> +{
> +	int max_order;
>  
>  	if (max_level == PG_LEVEL_4K)
>  		return PG_LEVEL_4K;
>  
> -	host_level = host_pfn_mapping_level(kvm, gfn, slot);
> -	return min(host_level, max_level);
> +	max_order = kvm_gmem_mapping_order(slot, gfn);
> +	return min(max_level, kvm_max_level_for_order(max_order));
>  }
>  
>  int kvm_mmu_max_mapping_level(struct kvm *kvm,
>  			      const struct kvm_memory_slot *slot, gfn_t gfn)
>  {
> -	bool is_private = kvm_slot_has_gmem(slot) &&
> -			  kvm_mem_is_private(kvm, gfn);
> +	int max_level;
> +
> +	max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM);
> +	if (max_level == PG_LEVEL_4K)
> +		return PG_LEVEL_4K;
>  
> -	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
> +	if (kvm_slot_has_gmem(slot) &&
> +	    (kvm_gmem_memslot_supports_shared(slot) ||
> +	     kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> +		return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> +	}
> +
> +	return min(max_level, host_pfn_mapping_level(kvm, gfn, slot));
> +}
> +
> +static inline bool fault_from_gmem(struct kvm_page_fault *fault)
> +{
> +	return fault->is_private ||
> +	       (kvm_slot_has_gmem(fault->slot) &&
> +		kvm_gmem_memslot_supports_shared(fault->slot));
>  }
>  
>  void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> @@ -3309,12 +3346,20 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	 * Enforce the iTLB multihit workaround after capturing the requested
>  	 * level, which will be used to do precise, accurate accounting.
>  	 */
> -	fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot,
> -						       fault->gfn, fault->max_level,
> -						       fault->is_private);
> +	fault->req_level = kvm_lpage_info_max_mapping_level(vcpu->kvm, slot,
> +							    fault->gfn, fault->max_level);
>  	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
>  		return;
>  
> +	if (!fault_from_gmem(fault)) {
> +		int host_level;
> +
> +		host_level = host_pfn_mapping_level(vcpu->kvm, fault->gfn, slot);
> +		fault->req_level = min(fault->req_level, host_level);
> +		if (fault->req_level == PG_LEVEL_4K)
> +			return;
> +	}
> +
>  	/*
>  	 * mmu_invalidate_retry() was successful and mmu_lock is held, so
>  	 * the pmd can't be split from under us.
> @@ -4448,23 +4493,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>  		vcpu->stat.pf_fixed++;
>  }
>  
> -static inline u8 kvm_max_level_for_order(int order)
> -{
> -	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
> -
> -	KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
> -			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
> -			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
> -
> -	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
> -		return PG_LEVEL_1G;
> -
> -	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
> -		return PG_LEVEL_2M;
> -
> -	return PG_LEVEL_4K;
> -}
> -
>  static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
>  					    struct kvm_page_fault *fault,
>  					    int order)
> @@ -4523,7 +4551,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>  {
>  	unsigned int foll = fault->write ? FOLL_WRITE : 0;
>  
> -	if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
> +	if (fault_from_gmem(fault))
>  		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>  
>  	foll |= FOLL_NOWAIT;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index de7b46ee1762..f9bb025327c3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>  		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
>  		     int *max_order);
> +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
>  #else
>  static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot, gfn_t gfn,
> @@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>  	KVM_BUG_ON(1, kvm);
>  	return -EIO;
>  }
> +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
> +					 gfn_t gfn)
> +{
> +	BUG();
> +	return 0;
> +}
>  #endif /* CONFIG_KVM_GMEM */
>  
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index fe0245335c96..b8e247063b20 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -774,6 +774,18 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>  }
>  EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
>  
> +/**
> + * Returns the mapping order for this @gfn in @slot.
> + *
> + * This is equal to max_order that would be returned if kvm_gmem_get_pfn() were
> + * called now.
> + */
make W=1 ./ -s generates following warnings-

warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Returns the mapping order for this @gfn in @slot

This will fix it.

Subject: [PATCH] tmp

Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 virt/kvm/guest_memfd.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b8e247063b20..d880b9098cc0 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -775,10 +775,12 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
 /**
- * Returns the mapping order for this @gfn in @slot.
+ * kvm_gmem_mapping_order - Get the mapping order for a GFN.
+ * @slot: The KVM memory slot containing the @gfn.
+ * @gfn: The guest frame number to check.
  *
- * This is equal to max_order that would be returned if kvm_gmem_get_pfn() were
- * called now.
+ * Returns: The mapping order for a @gfn in @slot. This is equal to max_order
+ *          that kvm_gmem_get_pfn() would return for this @gfn.
  */
 int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-- 
2.34.1

Thanks,
Shivank


> +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn)
> +{
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_gmem_mapping_order);
> +
>  #ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
>  long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
>  		       kvm_gmem_populate_cb post_populate, void *opaque)



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-14  7:13   ` Shivank Garg
@ 2025-05-14  7:24     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-14  7:24 UTC (permalink / raw)
  To: Shivank Garg
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Wed, 14 May 2025 at 08:14, Shivank Garg <shivankg@amd.com> wrote:
>
> On 5/13/2025 10:04 PM, Fuad Tabba wrote:
> > From: Ackerley Tng <ackerleytng@google.com>
> >
> > This patch adds kvm_gmem_max_mapping_level(), which always returns
> > PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
> >
> > When guest_memfd supports shared memory, max_mapping_level (especially
> > when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
> > from recover_huge_pages_range()) should take input from
> > guest_memfd.
> >
> > Input from guest_memfd should be taken in these cases:
> >
> > + if the memslot supports shared memory (guest_memfd is used for
> >   shared memory, or in future both shared and private memory) or
> > + if the memslot is only used for private memory and that gfn is
> >   private.
> >
> > If the memslot doesn't use guest_memfd, figure out the
> > max_mapping_level using the host page tables like before.
> >
> > This patch also refactors and inlines the other call to
> > __kvm_mmu_max_mapping_level().
> >
> > In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
> > provided (if applicable) in fault->max_level. Hence, there is no need
> > to query guest_memfd.
> >
> > lpage_info is queried like before, and then if the fault is not from
> > guest_memfd, adjust fault->req_level based on input from host page
> > tables.
> >
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c   | 92 ++++++++++++++++++++++++++--------------
> >  include/linux/kvm_host.h |  7 +++
> >  virt/kvm/guest_memfd.c   | 12 ++++++
> >  3 files changed, 79 insertions(+), 32 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index cfbb471f7c70..9e0bc8114859 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
> >       return level;
> >  }
> >
> > -static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
> > -                                    const struct kvm_memory_slot *slot,
> > -                                    gfn_t gfn, int max_level, bool is_private)
> > +static int kvm_lpage_info_max_mapping_level(struct kvm *kvm,
> > +                                         const struct kvm_memory_slot *slot,
> > +                                         gfn_t gfn, int max_level)
> >  {
> >       struct kvm_lpage_info *linfo;
> > -     int host_level;
> >
> >       max_level = min(max_level, max_huge_page_level);
> >       for ( ; max_level > PG_LEVEL_4K; max_level--) {
> > @@ -3270,23 +3269,61 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
> >                       break;
> >       }
> >
> > -     if (is_private)
> > -             return max_level;
> > +     return max_level;
> > +}
> > +
> > +static inline u8 kvm_max_level_for_order(int order)
> > +{
> > +     BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
> > +
> > +     KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
> > +                     order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
> > +                     order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
> > +
> > +     if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
> > +             return PG_LEVEL_1G;
> > +
> > +     if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
> > +             return PG_LEVEL_2M;
> > +
> > +     return PG_LEVEL_4K;
> > +}
> > +
> > +static inline int kvm_gmem_max_mapping_level(const struct kvm_memory_slot *slot,
> > +                                          gfn_t gfn, int max_level)
> > +{
> > +     int max_order;
> >
> >       if (max_level == PG_LEVEL_4K)
> >               return PG_LEVEL_4K;
> >
> > -     host_level = host_pfn_mapping_level(kvm, gfn, slot);
> > -     return min(host_level, max_level);
> > +     max_order = kvm_gmem_mapping_order(slot, gfn);
> > +     return min(max_level, kvm_max_level_for_order(max_order));
> >  }
> >
> >  int kvm_mmu_max_mapping_level(struct kvm *kvm,
> >                             const struct kvm_memory_slot *slot, gfn_t gfn)
> >  {
> > -     bool is_private = kvm_slot_has_gmem(slot) &&
> > -                       kvm_mem_is_private(kvm, gfn);
> > +     int max_level;
> > +
> > +     max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM);
> > +     if (max_level == PG_LEVEL_4K)
> > +             return PG_LEVEL_4K;
> >
> > -     return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
> > +     if (kvm_slot_has_gmem(slot) &&
> > +         (kvm_gmem_memslot_supports_shared(slot) ||
> > +          kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> > +             return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> > +     }
> > +
> > +     return min(max_level, host_pfn_mapping_level(kvm, gfn, slot));
> > +}
> > +
> > +static inline bool fault_from_gmem(struct kvm_page_fault *fault)
> > +{
> > +     return fault->is_private ||
> > +            (kvm_slot_has_gmem(fault->slot) &&
> > +             kvm_gmem_memslot_supports_shared(fault->slot));
> >  }
> >
> >  void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> > @@ -3309,12 +3346,20 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
> >        * Enforce the iTLB multihit workaround after capturing the requested
> >        * level, which will be used to do precise, accurate accounting.
> >        */
> > -     fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot,
> > -                                                    fault->gfn, fault->max_level,
> > -                                                    fault->is_private);
> > +     fault->req_level = kvm_lpage_info_max_mapping_level(vcpu->kvm, slot,
> > +                                                         fault->gfn, fault->max_level);
> >       if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
> >               return;
> >
> > +     if (!fault_from_gmem(fault)) {
> > +             int host_level;
> > +
> > +             host_level = host_pfn_mapping_level(vcpu->kvm, fault->gfn, slot);
> > +             fault->req_level = min(fault->req_level, host_level);
> > +             if (fault->req_level == PG_LEVEL_4K)
> > +                     return;
> > +     }
> > +
> >       /*
> >        * mmu_invalidate_retry() was successful and mmu_lock is held, so
> >        * the pmd can't be split from under us.
> > @@ -4448,23 +4493,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
> >               vcpu->stat.pf_fixed++;
> >  }
> >
> > -static inline u8 kvm_max_level_for_order(int order)
> > -{
> > -     BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
> > -
> > -     KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
> > -                     order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
> > -                     order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
> > -
> > -     if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
> > -             return PG_LEVEL_1G;
> > -
> > -     if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
> > -             return PG_LEVEL_2M;
> > -
> > -     return PG_LEVEL_4K;
> > -}
> > -
> >  static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
> >                                           struct kvm_page_fault *fault,
> >                                           int order)
> > @@ -4523,7 +4551,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> >  {
> >       unsigned int foll = fault->write ? FOLL_WRITE : 0;
> >
> > -     if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
> > +     if (fault_from_gmem(fault))
> >               return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
> >
> >       foll |= FOLL_NOWAIT;
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index de7b46ee1762..f9bb025327c3 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >  int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >                    gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
> >                    int *max_order);
> > +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
> >  #else
> >  static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >                                  struct kvm_memory_slot *slot, gfn_t gfn,
> > @@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >       KVM_BUG_ON(1, kvm);
> >       return -EIO;
> >  }
> > +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
> > +                                      gfn_t gfn)
> > +{
> > +     BUG();
> > +     return 0;
> > +}
> >  #endif /* CONFIG_KVM_GMEM */
> >
> >  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index fe0245335c96..b8e247063b20 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -774,6 +774,18 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
> >
> > +/**
> > + * Returns the mapping order for this @gfn in @slot.
> > + *
> > + * This is equal to max_order that would be returned if kvm_gmem_get_pfn() were
> > + * called now.
> > + */
> make W=1 ./ -s generates following warnings-
>
> warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
>  * Returns the mapping order for this @gfn in @slot
>
> This will fix it.

Thank you!
/fuad

> Subject: [PATCH] tmp
>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>  virt/kvm/guest_memfd.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index b8e247063b20..d880b9098cc0 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -775,10 +775,12 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>  EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
>
>  /**
> - * Returns the mapping order for this @gfn in @slot.
> + * kvm_gmem_mapping_order - Get the mapping order for a GFN.
> + * @slot: The KVM memory slot containing the @gfn.
> + * @gfn: The guest frame number to check.
>   *
> - * This is equal to max_order that would be returned if kvm_gmem_get_pfn() were
> - * called now.
> + * Returns: The mapping order for a @gfn in @slot. This is equal to max_order
> + *          that kvm_gmem_get_pfn() would return for this @gfn.
>   */
>  int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn)
>  {
> --
> 2.34.1
>
> Thanks,
> Shivank
>
>
> > +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn)
> > +{
> > +     return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_gmem_mapping_order);
> > +
> >  #ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
> >  long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
> >                      kvm_gmem_populate_cb post_populate, void *opaque)
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range
  2025-05-13 20:30   ` James Houghton
@ 2025-05-14  7:33     ` Fuad Tabba
  2025-05-14 13:32       ` Sean Christopherson
  0 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-14  7:33 UTC (permalink / raw)
  To: James Houghton
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

Hi James,

On Tue, 13 May 2025 at 21:31, James Houghton <jthoughton@google.com> wrote:
>
> On Tue, May 13, 2025 at 9:34 AM Fuad Tabba <tabba@google.com> wrote:
> >
> > From: Ackerley Tng <ackerleytng@google.com>
> >
> > On binding of a guest_memfd with a memslot, check that the slot's
> > userspace_addr and the requested fd and offset refer to the same memory
> > range.
> >
> > This check is best-effort: nothing prevents userspace from later mapping
> > other memory to the same provided in slot->userspace_addr and breaking
> > guest operation.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  virt/kvm/guest_memfd.c | 37 ++++++++++++++++++++++++++++++++++---
> >  1 file changed, 34 insertions(+), 3 deletions(-)
> >
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 8e6d1866b55e..2f499021df66 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> >         return __kvm_gmem_create(kvm, size, flags);
> >  }
> >
> > +static bool kvm_gmem_is_same_range(struct kvm *kvm,
> > +                                  struct kvm_memory_slot *slot,
> > +                                  struct file *file, loff_t offset)
> > +{
> > +       struct mm_struct *mm = kvm->mm;
> > +       loff_t userspace_addr_offset;
> > +       struct vm_area_struct *vma;
> > +       bool ret = false;
> > +
> > +       mmap_read_lock(mm);
> > +
> > +       vma = vma_lookup(mm, slot->userspace_addr);
> > +       if (!vma)
> > +               goto out;
> > +
> > +       if (vma->vm_file != file)
> > +               goto out;
> > +
> > +       userspace_addr_offset = slot->userspace_addr - vma->vm_start;
> > +       ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset;
> > +out:
> > +       mmap_read_unlock(mm);
> > +
> > +       return ret;
> > +}
> > +
> >  int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> >                   unsigned int fd, loff_t offset)
> >  {
> > @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> >             offset + size > i_size_read(inode))
> >                 goto err;
> >
> > -       if (kvm_gmem_supports_shared(inode) &&
> > -           !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> > -               goto err;
> > +       if (kvm_gmem_supports_shared(inode)) {
> > +               if (!kvm_arch_vm_supports_gmem_shared_mem(kvm))
> > +                       goto err;
> > +
> > +               if (slot->userspace_addr &&
> > +                   !kvm_gmem_is_same_range(kvm, slot, file, offset))
> > +                       goto err;
>
> This is very nit-picky, but I would rather this not be -EINVAL, maybe
> -EIO instead? Or maybe a pr_warn_once() and let the call proceed?
>
> The userspace_addr we got isn't invalid per se, we're just trying to
> give a hint to the user that their VMAs (or the userspace address they
> gave us) are messed up. I don't really like lumping this in with truly
> invalid arguments.

I don't mind changing the return error, but I don't think that we
should have a kernel warning (pr_warn_once) for something userspace
can trigger.

It's not an IO error either. I think that this is an invalid argument
(EINVAL). That said, other than opposing the idea of pr_warn, I am
happy to change it.

Cheers,
/fuad

> > +       }
> >
> >         filemap_invalidate_lock(inode->i_mapping);
> >
> > --
> > 2.49.0.1045.g170613ef41-goog
> >


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-13 16:34 ` [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
  2025-05-13 18:37   ` Ackerley Tng
@ 2025-05-14  8:03   ` Shivank Garg
  2025-05-14  9:45     ` Fuad Tabba
  2025-05-14 10:07   ` Roy, Patrick
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 88+ messages in thread
From: Shivank Garg @ 2025-05-14  8:03 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 5/13/2025 10:04 PM, Fuad Tabba wrote:
> This patch enables support for shared memory in guest_memfd, including
> mapping that memory at the host userspace. This support is gated by the
> configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> guest_memfd instance.
> 
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 10 ++++
>  include/linux/kvm_host.h        | 13 +++++
>  include/uapi/linux/kvm.h        |  1 +
>  virt/kvm/Kconfig                |  5 ++
>  virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
>  5 files changed, 117 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 709cc2a7ba66..f72722949cae 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>  
>  #ifdef CONFIG_KVM_GMEM
>  #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> +
> +/*
> + * CoCo VMs with hardware support that use guest_memfd only for backing private
> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> + */
> +#define kvm_arch_vm_supports_gmem_shared_mem(kvm)			\
> +	(IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&			\
> +	 ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||		\
> +	  (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
>  #else
>  #define kvm_arch_supports_gmem(kvm) false
> +#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false
>  #endif
>  
>  #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ae70e4e19700..2ec89c214978 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>  }
>  #endif
>  
> +/*
> + * Returns true if this VM supports shared mem in guest_memfd.
> + *
> + * Arch code must define kvm_arch_vm_supports_gmem_shared_mem if support for
> + * guest_memfd is enabled.
> + */
> +#if !defined(kvm_arch_vm_supports_gmem_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> +{
> +	return false;
> +}
> +#endif
> +
>  #ifndef kvm_arch_has_readonly_mem
>  static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
>  {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b6ae8ad8934b..9857022a0f0c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>  
>  #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED	(1UL << 0)
>  
>  struct kvm_create_guest_memfd {
>  	__u64 size;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 559c93ad90be..f4e469a62a60 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
>  config HAVE_KVM_ARCH_GMEM_INVALIDATE
>         bool
>         depends on KVM_GMEM
> +
> +config KVM_GMEM_SHARED_MEM
> +       select KVM_GMEM
> +       bool
> +       prompt "Enables in-place shared memory for guest_memfd"

Hi,

I noticed following warnings with checkpatch.pl:

WARNING: Argument 'kvm' is not used in function-like macro
#42: FILE: arch/x86/include/asm/kvm_host.h:2269:
+#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false

WARNING: please write a help paragraph that fully describes the config symbol with at least 4 lines
#91: FILE: virt/kvm/Kconfig:132:
+config KVM_GMEM_SHARED_MEM
+       select KVM_GMEM
+       bool
+       prompt "Enables in-place shared memory for guest_memfd"

0003-KVM-Rename-kvm_arch_has_private_mem-to-kvm_arch_supp.patch
-----------------------------------------------------------------------------
WARNING: Argument 'kvm' is not used in function-like macro
#35: FILE: arch/x86/include/asm/kvm_host.h:2259:
+#define kvm_arch_supports_gmem(kvm) false

total: 0 errors, 1 warnings, 91 lines checked

Please let me know if these are ignored intentionally - if so, sorry for the noise.

Best Regards,
Shivank


> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 6db515833f61..8e6d1866b55e 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>  	return gfn - slot->base_gfn + slot->gmem.pgoff;
>  }
>  
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +
> +static bool kvm_gmem_supports_shared(struct inode *inode)
> +{
> +	uint64_t flags = (uint64_t)inode->i_private;
> +
> +	return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +}
> +
> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> +{
> +	struct inode *inode = file_inode(vmf->vma->vm_file);
> +	struct folio *folio;
> +	vm_fault_t ret = VM_FAULT_LOCKED;
> +
> +	filemap_invalidate_lock_shared(inode->i_mapping);
> +
> +	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> +	if (IS_ERR(folio)) {
> +		int err = PTR_ERR(folio);
> +
> +		if (err == -EAGAIN)
> +			ret = VM_FAULT_RETRY;
> +		else
> +			ret = vmf_error(err);
> +
> +		goto out_filemap;
> +	}
> +
> +	if (folio_test_hwpoison(folio)) {
> +		ret = VM_FAULT_HWPOISON;
> +		goto out_folio;
> +	}
> +
> +	if (WARN_ON_ONCE(folio_test_large(folio))) {
> +		ret = VM_FAULT_SIGBUS;
> +		goto out_folio;
> +	}
> +
> +	if (!folio_test_uptodate(folio)) {
> +		clear_highpage(folio_page(folio, 0));
> +		kvm_gmem_mark_prepared(folio);
> +	}
> +
> +	vmf->page = folio_file_page(folio, vmf->pgoff);
> +
> +out_folio:
> +	if (ret != VM_FAULT_LOCKED) {
> +		folio_unlock(folio);
> +		folio_put(folio);
> +	}
> +
> +out_filemap:
> +	filemap_invalidate_unlock_shared(inode->i_mapping);
> +
> +	return ret;
> +}
> +
> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> +	.fault = kvm_gmem_fault_shared,
> +};
> +
> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	if (!kvm_gmem_supports_shared(file_inode(file)))
> +		return -ENODEV;
> +
> +	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> +	    (VM_SHARED | VM_MAYSHARE)) {
> +		return -EINVAL;
> +	}
> +
> +	vma->vm_ops = &kvm_gmem_vm_ops;
> +
> +	return 0;
> +}
> +#else
> +#define kvm_gmem_mmap NULL
> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> +
>  static struct file_operations kvm_gmem_fops = {
> +	.mmap		= kvm_gmem_mmap,
>  	.open		= generic_file_open,
>  	.release	= kvm_gmem_release,
>  	.fallocate	= kvm_gmem_fallocate,
> @@ -463,6 +544,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>  	u64 flags = args->flags;
>  	u64 valid_flags = 0;
>  
> +	if (kvm_arch_vm_supports_gmem_shared_mem(kvm))
> +		valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +
>  	if (flags & ~valid_flags)
>  		return -EINVAL;
>  
> @@ -501,6 +585,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
>  	    offset + size > i_size_read(inode))
>  		goto err;
>  
> +	if (kvm_gmem_supports_shared(inode) &&
> +	    !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> +		goto err;
> +
>  	filemap_invalidate_lock(inode->i_mapping);
>  
>  	start = offset >> PAGE_SHIFT;



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-14  8:03   ` Shivank Garg
@ 2025-05-14  9:45     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-14  9:45 UTC (permalink / raw)
  To: Shivank Garg
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Thanks Shivank,

On Wed, 14 May 2025 at 09:03, Shivank Garg <shivankg@amd.com> wrote:
>
> On 5/13/2025 10:04 PM, Fuad Tabba wrote:
> > This patch enables support for shared memory in guest_memfd, including
> > mapping that memory at the host userspace. This support is gated by the
> > configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> > flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> > guest_memfd instance.
> >
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h | 10 ++++
> >  include/linux/kvm_host.h        | 13 +++++
> >  include/uapi/linux/kvm.h        |  1 +
> >  virt/kvm/Kconfig                |  5 ++
> >  virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
> >  5 files changed, 117 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 709cc2a7ba66..f72722949cae 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> >
> >  #ifdef CONFIG_KVM_GMEM
> >  #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> > +
> > +/*
> > + * CoCo VMs with hardware support that use guest_memfd only for backing private
> > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> > + */
> > +#define kvm_arch_vm_supports_gmem_shared_mem(kvm)                    \
> > +     (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&                      \
> > +      ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||             \
> > +       (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
> >  #else
> >  #define kvm_arch_supports_gmem(kvm) false
> > +#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false
> >  #endif
> >
> >  #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index ae70e4e19700..2ec89c214978 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> >  }
> >  #endif
> >
> > +/*
> > + * Returns true if this VM supports shared mem in guest_memfd.
> > + *
> > + * Arch code must define kvm_arch_vm_supports_gmem_shared_mem if support for
> > + * guest_memfd is enabled.
> > + */
> > +#if !defined(kvm_arch_vm_supports_gmem_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> > +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> > +{
> > +     return false;
> > +}
> > +#endif
> > +
> >  #ifndef kvm_arch_has_readonly_mem
> >  static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
> >  {
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index b6ae8ad8934b..9857022a0f0c 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
> >  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
> >
> >  #define KVM_CREATE_GUEST_MEMFD       _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> > +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED      (1UL << 0)
> >
> >  struct kvm_create_guest_memfd {
> >       __u64 size;
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index 559c93ad90be..f4e469a62a60 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
> >  config HAVE_KVM_ARCH_GMEM_INVALIDATE
> >         bool
> >         depends on KVM_GMEM
> > +
> > +config KVM_GMEM_SHARED_MEM
> > +       select KVM_GMEM
> > +       bool
> > +       prompt "Enables in-place shared memory for guest_memfd"
>
> Hi,
>
> I noticed following warnings with checkpatch.pl:
>
> WARNING: Argument 'kvm' is not used in function-like macro
> #42: FILE: arch/x86/include/asm/kvm_host.h:2269:
> +#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false
>
> WARNING: please write a help paragraph that fully describes the config symbol with at least 4 lines
> #91: FILE: virt/kvm/Kconfig:132:
> +config KVM_GMEM_SHARED_MEM
> +       select KVM_GMEM
> +       bool
> +       prompt "Enables in-place shared memory for guest_memfd"
>
> 0003-KVM-Rename-kvm_arch_has_private_mem-to-kvm_arch_supp.patch
> -----------------------------------------------------------------------------
> WARNING: Argument 'kvm' is not used in function-like macro
> #35: FILE: arch/x86/include/asm/kvm_host.h:2259:
> +#define kvm_arch_supports_gmem(kvm) false
>
> total: 0 errors, 1 warnings, 91 lines checked
>
> Please let me know if these are ignored intentionally - if so, sorry for the noise.

Yes, I did intentionally ignore these.
kvm_arch_vm_supports_gmem_shared_mem() follows the same pattern as
kvm_arch_supports_gmem(). As for the comment, I couldn't think of four
lines to describe the config option that's not just fluff :)

Cheers,
/fuad

> Best Regards,
> Shivank
>
>
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 6db515833f61..8e6d1866b55e 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> >       return gfn - slot->base_gfn + slot->gmem.pgoff;
> >  }
> >
> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > +
> > +static bool kvm_gmem_supports_shared(struct inode *inode)
> > +{
> > +     uint64_t flags = (uint64_t)inode->i_private;
> > +
> > +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +}
> > +
> > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> > +{
> > +     struct inode *inode = file_inode(vmf->vma->vm_file);
> > +     struct folio *folio;
> > +     vm_fault_t ret = VM_FAULT_LOCKED;
> > +
> > +     filemap_invalidate_lock_shared(inode->i_mapping);
> > +
> > +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> > +     if (IS_ERR(folio)) {
> > +             int err = PTR_ERR(folio);
> > +
> > +             if (err == -EAGAIN)
> > +                     ret = VM_FAULT_RETRY;
> > +             else
> > +                     ret = vmf_error(err);
> > +
> > +             goto out_filemap;
> > +     }
> > +
> > +     if (folio_test_hwpoison(folio)) {
> > +             ret = VM_FAULT_HWPOISON;
> > +             goto out_folio;
> > +     }
> > +
> > +     if (WARN_ON_ONCE(folio_test_large(folio))) {
> > +             ret = VM_FAULT_SIGBUS;
> > +             goto out_folio;
> > +     }
> > +
> > +     if (!folio_test_uptodate(folio)) {
> > +             clear_highpage(folio_page(folio, 0));
> > +             kvm_gmem_mark_prepared(folio);
> > +     }
> > +
> > +     vmf->page = folio_file_page(folio, vmf->pgoff);
> > +
> > +out_folio:
> > +     if (ret != VM_FAULT_LOCKED) {
> > +             folio_unlock(folio);
> > +             folio_put(folio);
> > +     }
> > +
> > +out_filemap:
> > +     filemap_invalidate_unlock_shared(inode->i_mapping);
> > +
> > +     return ret;
> > +}
> > +
> > +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> > +     .fault = kvm_gmem_fault_shared,
> > +};
> > +
> > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +     if (!kvm_gmem_supports_shared(file_inode(file)))
> > +             return -ENODEV;
> > +
> > +     if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> > +         (VM_SHARED | VM_MAYSHARE)) {
> > +             return -EINVAL;
> > +     }
> > +
> > +     vma->vm_ops = &kvm_gmem_vm_ops;
> > +
> > +     return 0;
> > +}
> > +#else
> > +#define kvm_gmem_mmap NULL
> > +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> > +
> >  static struct file_operations kvm_gmem_fops = {
> > +     .mmap           = kvm_gmem_mmap,
> >       .open           = generic_file_open,
> >       .release        = kvm_gmem_release,
> >       .fallocate      = kvm_gmem_fallocate,
> > @@ -463,6 +544,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> >       u64 flags = args->flags;
> >       u64 valid_flags = 0;
> >
> > +     if (kvm_arch_vm_supports_gmem_shared_mem(kvm))
> > +             valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +
> >       if (flags & ~valid_flags)
> >               return -EINVAL;
> >
> > @@ -501,6 +585,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> >           offset + size > i_size_read(inode))
> >               goto err;
> >
> > +     if (kvm_gmem_supports_shared(inode) &&
> > +         !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> > +             goto err;
> > +
> >       filemap_invalidate_lock(inode->i_mapping);
> >
> >       start = offset >> PAGE_SHIFT;
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-13 16:34 ` [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
  2025-05-13 18:37   ` Ackerley Tng
  2025-05-14  8:03   ` Shivank Garg
@ 2025-05-14 10:07   ` Roy, Patrick
  2025-05-14 11:30     ` Fuad Tabba
  2025-05-14 20:40   ` James Houghton
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 88+ messages in thread
From: Roy, Patrick @ 2025-05-14 10:07 UTC (permalink / raw)
  To: tabba@google.com
  Cc: ackerleytng@google.com, akpm@linux-foundation.org,
	amoorthy@google.com, anup@brainfault.org, aou@eecs.berkeley.edu,
	brauner@kernel.org, catalin.marinas@arm.com,
	chao.p.peng@linux.intel.com, chenhuacai@kernel.org,
	david@redhat.com, dmatlack@google.com, fvdl@google.com,
	hch@infradead.org, hughd@google.com, ira.weiny@intel.com,
	isaku.yamahata@gmail.com, isaku.yamahata@intel.com,
	james.morse@arm.com, jarkko@kernel.org, jgg@nvidia.com,
	jhubbard@nvidia.com, jthoughton@google.com, keirf@google.com,
	kirill.shutemov@linux.intel.com, kvm@vger.kernel.org,
	liam.merwick@oracle.com, linux-arm-msm@vger.kernel.org,
	linux-mm@kvack.org, mail@maciej.szmigiero.name, maz@kernel.org,
	mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au,
	oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com,
	paul.walmsley@sifive.com, pbonzini@redhat.com, peterx@redhat.com,
	qperret@google.com, quic_cvanscha@quicinc.com,
	quic_eberman@quicinc.com, quic_mnalajal@quicinc.com,
	quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com,
	rientjes@google.com, Roy, Patrick, seanjc@google.com,
	shuah@kernel.org, steven.price@arm.com, suzuki.poulose@arm.com,
	vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk,
	wei.w.wang@intel.com, will@kernel.org, willy@infradead.org,
	xiaoyao.li@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com

On Tue, 2025-05-13 at 17:34 +0100, Fuad Tabba wrote:
> This patch enables support for shared memory in guest_memfd, including
> mapping that memory at the host userspace. This support is gated by the
> configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> guest_memfd instance.
> 
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 10 ++++
>  include/linux/kvm_host.h        | 13 +++++
>  include/uapi/linux/kvm.h        |  1 +
>  virt/kvm/Kconfig                |  5 ++
>  virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
>  5 files changed, 117 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 709cc2a7ba66..f72722949cae 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> 
>  #ifdef CONFIG_KVM_GMEM
>  #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> +
> +/*
> + * CoCo VMs with hardware support that use guest_memfd only for backing private
> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> + */
> +#define kvm_arch_vm_supports_gmem_shared_mem(kvm)                      \
> +       (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&                      \
> +        ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||             \
> +         (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))

I forgot what we ended up deciding wrt "allow guest_memfd usage for default VMs
on x86" in the call two weeks ago, but if we want to do that as part of this
series, then this also needs 

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 12433b1e755b..904b15c678d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12716,7 +12716,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
                return -EINVAL;
 
        kvm->arch.vm_type = type;
-       kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
+       kvm->arch.supports_gmem = type == KVM_X86_SW_PROTECTED_VM || type == KVM_X86_DEFAULT_VM;
        /* Decided by the vendor code for other VM types.  */
        kvm->arch.pre_fault_allowed =
                type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;

and with that I was able to run my firecracker tests on top of this patch
series with X86_DEFAULT_VM. But I did wonder about this define in
x86/include/asm/kvm_host.h:

/* SMM is currently unsupported for guests with guest_memfd (esp private) memory. */
# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_supports_gmem(kvm) ? 1 : 2)

which I'm not really sure what to make of, but which I think means enabling
guest_memfd for X86_DEFAULT_VM isn't as straight-forward as the above diff :/

Best,
Patrick

>  #else
>  #define kvm_arch_supports_gmem(kvm) false
> +#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false
>  #endif
> 
>  #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ae70e4e19700..2ec89c214978 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>  }
>  #endif
> 
> +/*
> + * Returns true if this VM supports shared mem in guest_memfd.
> + *
> + * Arch code must define kvm_arch_vm_supports_gmem_shared_mem if support for
> + * guest_memfd is enabled.
> + */
> +#if !defined(kvm_arch_vm_supports_gmem_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> +{
> +       return false;
> +}
> +#endif
> +
>  #ifndef kvm_arch_has_readonly_mem
>  static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
>  {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b6ae8ad8934b..9857022a0f0c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
> 
>  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED        (1UL << 0)
> 
>  struct kvm_create_guest_memfd {
>         __u64 size;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 559c93ad90be..f4e469a62a60 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
>  config HAVE_KVM_ARCH_GMEM_INVALIDATE
>         bool
>         depends on KVM_GMEM
> +
> +config KVM_GMEM_SHARED_MEM
> +       select KVM_GMEM
> +       bool
> +       prompt "Enables in-place shared memory for guest_memfd"
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 6db515833f61..8e6d1866b55e 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>         return gfn - slot->base_gfn + slot->gmem.pgoff;
>  }
> 
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +
> +static bool kvm_gmem_supports_shared(struct inode *inode)
> +{
> +       uint64_t flags = (uint64_t)inode->i_private;
> +
> +       return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +}
> +
> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> +{
> +       struct inode *inode = file_inode(vmf->vma->vm_file);
> +       struct folio *folio;
> +       vm_fault_t ret = VM_FAULT_LOCKED;
> +
> +       filemap_invalidate_lock_shared(inode->i_mapping);
> +
> +       folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> +       if (IS_ERR(folio)) {
> +               int err = PTR_ERR(folio);
> +
> +               if (err == -EAGAIN)
> +                       ret = VM_FAULT_RETRY;
> +               else
> +                       ret = vmf_error(err);
> +
> +               goto out_filemap;
> +       }
> +
> +       if (folio_test_hwpoison(folio)) {
> +               ret = VM_FAULT_HWPOISON;
> +               goto out_folio;
> +       }
> +
> +       if (WARN_ON_ONCE(folio_test_large(folio))) {
> +               ret = VM_FAULT_SIGBUS;
> +               goto out_folio;
> +       }
> +
> +       if (!folio_test_uptodate(folio)) {
> +               clear_highpage(folio_page(folio, 0));
> +               kvm_gmem_mark_prepared(folio);
> +       }
> +
> +       vmf->page = folio_file_page(folio, vmf->pgoff);
> +
> +out_folio:
> +       if (ret != VM_FAULT_LOCKED) {
> +               folio_unlock(folio);
> +               folio_put(folio);
> +       }
> +
> +out_filemap:
> +       filemap_invalidate_unlock_shared(inode->i_mapping);
> +
> +       return ret;
> +}
> +
> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> +       .fault = kvm_gmem_fault_shared,
> +};
> +
> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +       if (!kvm_gmem_supports_shared(file_inode(file)))
> +               return -ENODEV;
> +
> +       if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> +           (VM_SHARED | VM_MAYSHARE)) {
> +               return -EINVAL;
> +       }
> +
> +       vma->vm_ops = &kvm_gmem_vm_ops;
> +
> +       return 0;
> +}
> +#else
> +#define kvm_gmem_mmap NULL
> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> +
>  static struct file_operations kvm_gmem_fops = {
> +       .mmap           = kvm_gmem_mmap,
>         .open           = generic_file_open,
>         .release        = kvm_gmem_release,
>         .fallocate      = kvm_gmem_fallocate,
> @@ -463,6 +544,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>         u64 flags = args->flags;
>         u64 valid_flags = 0;
> 
> +       if (kvm_arch_vm_supports_gmem_shared_mem(kvm))
> +               valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +
>         if (flags & ~valid_flags)
>                 return -EINVAL;
> 
> @@ -501,6 +585,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
>             offset + size > i_size_read(inode))
>                 goto err;
> 
> +       if (kvm_gmem_supports_shared(inode) &&
> +           !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> +               goto err;
> +
>         filemap_invalidate_lock(inode->i_mapping);
> 
>         start = offset >> PAGE_SHIFT;
> --
> 2.49.0.1045.g170613ef41-goog
> 



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-14 10:07   ` Roy, Patrick
@ 2025-05-14 11:30     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-14 11:30 UTC (permalink / raw)
  To: Roy, Patrick
  Cc: ackerleytng@google.com, akpm@linux-foundation.org,
	amoorthy@google.com, anup@brainfault.org, aou@eecs.berkeley.edu,
	brauner@kernel.org, catalin.marinas@arm.com,
	chao.p.peng@linux.intel.com, chenhuacai@kernel.org,
	david@redhat.com, dmatlack@google.com, fvdl@google.com,
	hch@infradead.org, hughd@google.com, ira.weiny@intel.com,
	isaku.yamahata@gmail.com, isaku.yamahata@intel.com,
	james.morse@arm.com, jarkko@kernel.org, jgg@nvidia.com,
	jhubbard@nvidia.com, jthoughton@google.com, keirf@google.com,
	kirill.shutemov@linux.intel.com, kvm@vger.kernel.org,
	liam.merwick@oracle.com, linux-arm-msm@vger.kernel.org,
	linux-mm@kvack.org, mail@maciej.szmigiero.name, maz@kernel.org,
	mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au,
	oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com,
	paul.walmsley@sifive.com, pbonzini@redhat.com, peterx@redhat.com,
	qperret@google.com, quic_cvanscha@quicinc.com,
	quic_eberman@quicinc.com, quic_mnalajal@quicinc.com,
	quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com,
	rientjes@google.com, seanjc@google.com, shuah@kernel.org,
	steven.price@arm.com, suzuki.poulose@arm.com,
	vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk,
	wei.w.wang@intel.com, will@kernel.org, willy@infradead.org,
	xiaoyao.li@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com

Hi Patrick,

On Wed, 14 May 2025 at 11:07, Roy, Patrick <roypat@amazon.co.uk> wrote:
>
> On Tue, 2025-05-13 at 17:34 +0100, Fuad Tabba wrote:
> > This patch enables support for shared memory in guest_memfd, including
> > mapping that memory at the host userspace. This support is gated by the
> > configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> > flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> > guest_memfd instance.
> >
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h | 10 ++++
> >  include/linux/kvm_host.h        | 13 +++++
> >  include/uapi/linux/kvm.h        |  1 +
> >  virt/kvm/Kconfig                |  5 ++
> >  virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
> >  5 files changed, 117 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 709cc2a7ba66..f72722949cae 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> >
> >  #ifdef CONFIG_KVM_GMEM
> >  #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> > +
> > +/*
> > + * CoCo VMs with hardware support that use guest_memfd only for backing private
> > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> > + */
> > +#define kvm_arch_vm_supports_gmem_shared_mem(kvm)                      \
> > +       (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&                      \
> > +        ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||             \
> > +         (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
>
> I forgot what we ended up deciding wrt "allow guest_memfd usage for default VMs
> on x86" in the call two weeks ago, but if we want to do that as part of this
> series, then this also needs

Yes we did. I missed it in this patch. I'll fix it.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 12433b1e755b..904b15c678d6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12716,7 +12716,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>                 return -EINVAL;
>
>         kvm->arch.vm_type = type;
> -       kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
> +       kvm->arch.supports_gmem = type == KVM_X86_SW_PROTECTED_VM || type == KVM_X86_DEFAULT_VM;
>         /* Decided by the vendor code for other VM types.  */
>         kvm->arch.pre_fault_allowed =
>                 type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
>
> and with that I was able to run my firecracker tests on top of this patch
> series with X86_DEFAULT_VM. But I did wonder about this define in
> x86/include/asm/kvm_host.h:
>
> /* SMM is currently unsupported for guests with guest_memfd (esp private) memory. */
> # define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_supports_gmem(kvm) ? 1 : 2)
>
> which I'm not really sure what to make of, but which I think means enabling
> guest_memfd for X86_DEFAULT_VM isn't as straight-forward as the above diff :/

Not quite, but I'll sort it out.

Thanks,
/fuad

> Best,
> Patrick
>
> >  #else
> >  #define kvm_arch_supports_gmem(kvm) false
> > +#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false
> >  #endif
> >
> >  #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index ae70e4e19700..2ec89c214978 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> >  }
> >  #endif
> >
> > +/*
> > + * Returns true if this VM supports shared mem in guest_memfd.
> > + *
> > + * Arch code must define kvm_arch_vm_supports_gmem_shared_mem if support for
> > + * guest_memfd is enabled.
> > + */
> > +#if !defined(kvm_arch_vm_supports_gmem_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> > +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> > +{
> > +       return false;
> > +}
> > +#endif
> > +
> >  #ifndef kvm_arch_has_readonly_mem
> >  static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
> >  {
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index b6ae8ad8934b..9857022a0f0c 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
> >  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
> >
> >  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> > +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED        (1UL << 0)
> >
> >  struct kvm_create_guest_memfd {
> >         __u64 size;
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index 559c93ad90be..f4e469a62a60 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
> >  config HAVE_KVM_ARCH_GMEM_INVALIDATE
> >         bool
> >         depends on KVM_GMEM
> > +
> > +config KVM_GMEM_SHARED_MEM
> > +       select KVM_GMEM
> > +       bool
> > +       prompt "Enables in-place shared memory for guest_memfd"
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 6db515833f61..8e6d1866b55e 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> >         return gfn - slot->base_gfn + slot->gmem.pgoff;
> >  }
> >
> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > +
> > +static bool kvm_gmem_supports_shared(struct inode *inode)
> > +{
> > +       uint64_t flags = (uint64_t)inode->i_private;
> > +
> > +       return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +}
> > +
> > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> > +{
> > +       struct inode *inode = file_inode(vmf->vma->vm_file);
> > +       struct folio *folio;
> > +       vm_fault_t ret = VM_FAULT_LOCKED;
> > +
> > +       filemap_invalidate_lock_shared(inode->i_mapping);
> > +
> > +       folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> > +       if (IS_ERR(folio)) {
> > +               int err = PTR_ERR(folio);
> > +
> > +               if (err == -EAGAIN)
> > +                       ret = VM_FAULT_RETRY;
> > +               else
> > +                       ret = vmf_error(err);
> > +
> > +               goto out_filemap;
> > +       }
> > +
> > +       if (folio_test_hwpoison(folio)) {
> > +               ret = VM_FAULT_HWPOISON;
> > +               goto out_folio;
> > +       }
> > +
> > +       if (WARN_ON_ONCE(folio_test_large(folio))) {
> > +               ret = VM_FAULT_SIGBUS;
> > +               goto out_folio;
> > +       }
> > +
> > +       if (!folio_test_uptodate(folio)) {
> > +               clear_highpage(folio_page(folio, 0));
> > +               kvm_gmem_mark_prepared(folio);
> > +       }
> > +
> > +       vmf->page = folio_file_page(folio, vmf->pgoff);
> > +
> > +out_folio:
> > +       if (ret != VM_FAULT_LOCKED) {
> > +               folio_unlock(folio);
> > +               folio_put(folio);
> > +       }
> > +
> > +out_filemap:
> > +       filemap_invalidate_unlock_shared(inode->i_mapping);
> > +
> > +       return ret;
> > +}
> > +
> > +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> > +       .fault = kvm_gmem_fault_shared,
> > +};
> > +
> > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +       if (!kvm_gmem_supports_shared(file_inode(file)))
> > +               return -ENODEV;
> > +
> > +       if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> > +           (VM_SHARED | VM_MAYSHARE)) {
> > +               return -EINVAL;
> > +       }
> > +
> > +       vma->vm_ops = &kvm_gmem_vm_ops;
> > +
> > +       return 0;
> > +}
> > +#else
> > +#define kvm_gmem_mmap NULL
> > +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> > +
> >  static struct file_operations kvm_gmem_fops = {
> > +       .mmap           = kvm_gmem_mmap,
> >         .open           = generic_file_open,
> >         .release        = kvm_gmem_release,
> >         .fallocate      = kvm_gmem_fallocate,
> > @@ -463,6 +544,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> >         u64 flags = args->flags;
> >         u64 valid_flags = 0;
> >
> > +       if (kvm_arch_vm_supports_gmem_shared_mem(kvm))
> > +               valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +
> >         if (flags & ~valid_flags)
> >                 return -EINVAL;
> >
> > @@ -501,6 +585,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> >             offset + size > i_size_read(inode))
> >                 goto err;
> >
> > +       if (kvm_gmem_supports_shared(inode) &&
> > +           !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> > +               goto err;
> > +
> >         filemap_invalidate_lock(inode->i_mapping);
> >
> >         start = offset >> PAGE_SHIFT;
> > --
> > 2.49.0.1045.g170613ef41-goog
> >
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range
  2025-05-14  7:33     ` Fuad Tabba
@ 2025-05-14 13:32       ` Sean Christopherson
  2025-05-14 13:47         ` Ackerley Tng
  0 siblings, 1 reply; 88+ messages in thread
From: Sean Christopherson @ 2025-05-14 13:32 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: James Houghton, kvm, linux-arm-msm, linux-mm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
	pankaj.gupta, ira.weiny

On Wed, May 14, 2025, Fuad Tabba wrote:
> On Tue, 13 May 2025 at 21:31, James Houghton <jthoughton@google.com> wrote:
> >
> > On Tue, May 13, 2025 at 9:34 AM Fuad Tabba <tabba@google.com> wrote:
> > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > > index 8e6d1866b55e..2f499021df66 100644
> > > --- a/virt/kvm/guest_memfd.c
> > > +++ b/virt/kvm/guest_memfd.c
> > > @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> > >         return __kvm_gmem_create(kvm, size, flags);
> > >  }
> > >
> > > +static bool kvm_gmem_is_same_range(struct kvm *kvm,
> > > +                                  struct kvm_memory_slot *slot,
> > > +                                  struct file *file, loff_t offset)
> > > +{
> > > +       struct mm_struct *mm = kvm->mm;
> > > +       loff_t userspace_addr_offset;
> > > +       struct vm_area_struct *vma;
> > > +       bool ret = false;
> > > +
> > > +       mmap_read_lock(mm);
> > > +
> > > +       vma = vma_lookup(mm, slot->userspace_addr);
> > > +       if (!vma)
> > > +               goto out;
> > > +
> > > +       if (vma->vm_file != file)
> > > +               goto out;
> > > +
> > > +       userspace_addr_offset = slot->userspace_addr - vma->vm_start;
> > > +       ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset;
> > > +out:
> > > +       mmap_read_unlock(mm);
> > > +
> > > +       return ret;
> > > +}
> > > +
> > >  int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> > >                   unsigned int fd, loff_t offset)
> > >  {
> > > @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> > >             offset + size > i_size_read(inode))
> > >                 goto err;
> > >
> > > -       if (kvm_gmem_supports_shared(inode) &&
> > > -           !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> > > -               goto err;
> > > +       if (kvm_gmem_supports_shared(inode)) {
> > > +               if (!kvm_arch_vm_supports_gmem_shared_mem(kvm))
> > > +                       goto err;
> > > +
> > > +               if (slot->userspace_addr &&
> > > +                   !kvm_gmem_is_same_range(kvm, slot, file, offset))
> > > +                       goto err;
> >
> > This is very nit-picky, but I would rather this not be -EINVAL, maybe
> > -EIO instead? Or maybe a pr_warn_once() and let the call proceed?

Or just omit the check entirely.  The check isn't binding (ba-dump, ching!),
because the mapping/VMA can change the instant mmap_read_unlock() is called.

> > The userspace_addr we got isn't invalid per se, we're just trying to
> > give a hint to the user that their VMAs (or the userspace address they
> > gave us) are messed up. I don't really like lumping this in with truly
> > invalid arguments.
> 
> I don't mind changing the return error, but I don't think that we
> should have a kernel warning (pr_warn_once) for something userspace
> can trigger.

This isn't a WARN, e.g. won't trip panic_on_warn.  In practice, it's not
meaningfully different than pr_info().  That said, I agree that printing anything
is a bad approach.

> It's not an IO error either. I think that this is an invalid argument
> (EINVAL).

I agree with James, this isn't an invalid argument.  Having the validity of an
input hinge on the ordering between a KVM ioctl() and mmap() is quite odd.  I
know KVM arm64 does exactly this for KVM_SET_USER_MEMORY_REGION{,2}, but I don't
love the semantics.  And unlike that scenario, where e.g. MTE tags are verified
again at fault-time, KVM won't re-check the VMA when accessing guest memory via
the userspace mapping, e.g. through uaccess.

Unless I'm forgetting something, I'm leaning toward omitting the check entirely.

> That said, other than opposing the idea of pr_warn, I am happy to change it.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range
  2025-05-14 13:32       ` Sean Christopherson
@ 2025-05-14 13:47         ` Ackerley Tng
  2025-05-14 13:52           ` Sean Christopherson
  0 siblings, 1 reply; 88+ messages in thread
From: Ackerley Tng @ 2025-05-14 13:47 UTC (permalink / raw)
  To: Sean Christopherson, Fuad Tabba
  Cc: James Houghton, kvm, linux-arm-msm, linux-mm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

Sean Christopherson <seanjc@google.com> writes:

> On Wed, May 14, 2025, Fuad Tabba wrote:
>> On Tue, 13 May 2025 at 21:31, James Houghton <jthoughton@google.com> wrote:
>> >
>> > On Tue, May 13, 2025 at 9:34 AM Fuad Tabba <tabba@google.com> wrote:
>> > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> > > index 8e6d1866b55e..2f499021df66 100644
>> > > --- a/virt/kvm/guest_memfd.c
>> > > +++ b/virt/kvm/guest_memfd.c
>> > > @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>> > >         return __kvm_gmem_create(kvm, size, flags);
>> > >  }
>> > >
>> > > +static bool kvm_gmem_is_same_range(struct kvm *kvm,
>> > > +                                  struct kvm_memory_slot *slot,
>> > > +                                  struct file *file, loff_t offset)
>> > > +{
>> > > +       struct mm_struct *mm = kvm->mm;
>> > > +       loff_t userspace_addr_offset;
>> > > +       struct vm_area_struct *vma;
>> > > +       bool ret = false;
>> > > +
>> > > +       mmap_read_lock(mm);
>> > > +
>> > > +       vma = vma_lookup(mm, slot->userspace_addr);
>> > > +       if (!vma)
>> > > +               goto out;
>> > > +
>> > > +       if (vma->vm_file != file)
>> > > +               goto out;
>> > > +
>> > > +       userspace_addr_offset = slot->userspace_addr - vma->vm_start;
>> > > +       ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset;
>> > > +out:
>> > > +       mmap_read_unlock(mm);
>> > > +
>> > > +       return ret;
>> > > +}
>> > > +
>> > >  int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
>> > >                   unsigned int fd, loff_t offset)
>> > >  {
>> > > @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
>> > >             offset + size > i_size_read(inode))
>> > >                 goto err;
>> > >
>> > > -       if (kvm_gmem_supports_shared(inode) &&
>> > > -           !kvm_arch_vm_supports_gmem_shared_mem(kvm))
>> > > -               goto err;
>> > > +       if (kvm_gmem_supports_shared(inode)) {
>> > > +               if (!kvm_arch_vm_supports_gmem_shared_mem(kvm))
>> > > +                       goto err;
>> > > +
>> > > +               if (slot->userspace_addr &&
>> > > +                   !kvm_gmem_is_same_range(kvm, slot, file, offset))
>> > > +                       goto err;
>> >
>> > This is very nit-picky, but I would rather this not be -EINVAL, maybe
>> > -EIO instead? Or maybe a pr_warn_once() and let the call proceed?
>
> Or just omit the check entirely.  The check isn't binding (ba-dump, ching!),
> because the mapping/VMA can change the instant mmap_read_unlock() is called.
>
>> > The userspace_addr we got isn't invalid per se, we're just trying to
>> > give a hint to the user that their VMAs (or the userspace address they
>> > gave us) are messed up. I don't really like lumping this in with truly
>> > invalid arguments.
>> 
>> I don't mind changing the return error, but I don't think that we
>> should have a kernel warning (pr_warn_once) for something userspace
>> can trigger.
>
> This isn't a WARN, e.g. won't trip panic_on_warn.  In practice, it's not
> meaningfully different than pr_info().  That said, I agree that printing anything
> is a bad approach.
>
>> It's not an IO error either. I think that this is an invalid argument
>> (EINVAL).
>
> I agree with James, this isn't an invalid argument.  Having the validity of an
> input hinge on the ordering between a KVM ioctl() and mmap() is quite odd.  I
> know KVM arm64 does exactly this for KVM_SET_USER_MEMORY_REGION{,2}, but I don't
> love the semantics.  And unlike that scenario, where e.g. MTE tags are verified
> again at fault-time, KVM won't re-check the VMA when accessing guest memory via
> the userspace mapping, e.g. through uaccess.
>
> Unless I'm forgetting something, I'm leaning toward omitting the check entirely.
>

I'm good with dropping this patch. I might have misunderstood the
conclusion of the guest_memfd call.

>> That said, other than opposing the idea of pr_warn, I am happy to change it.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range
  2025-05-14 13:47         ` Ackerley Tng
@ 2025-05-14 13:52           ` Sean Christopherson
  0 siblings, 0 replies; 88+ messages in thread
From: Sean Christopherson @ 2025-05-14 13:52 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Fuad Tabba, James Houghton, kvm, linux-arm-msm, linux-mm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

On Wed, May 14, 2025, Ackerley Tng wrote:
> Sean Christopherson <seanjc@google.com> writes:
> > On Wed, May 14, 2025, Fuad Tabba wrote:
> >> On Tue, 13 May 2025 at 21:31, James Houghton <jthoughton@google.com> wrote:
> >> > > @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> >> > >             offset + size > i_size_read(inode))
> >> > >                 goto err;
> >> > >
> >> > > -       if (kvm_gmem_supports_shared(inode) &&
> >> > > -           !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> >> > > -               goto err;
> >> > > +       if (kvm_gmem_supports_shared(inode)) {
> >> > > +               if (!kvm_arch_vm_supports_gmem_shared_mem(kvm))
> >> > > +                       goto err;
> >> > > +
> >> > > +               if (slot->userspace_addr &&
> >> > > +                   !kvm_gmem_is_same_range(kvm, slot, file, offset))
> >> > > +                       goto err;
> >> >
> >> > This is very nit-picky, but I would rather this not be -EINVAL, maybe
> >> > -EIO instead? Or maybe a pr_warn_once() and let the call proceed?
> >
> > Or just omit the check entirely.  The check isn't binding (ba-dump, ching!),
> > because the mapping/VMA can change the instant mmap_read_unlock() is called.
> >
> >> > The userspace_addr we got isn't invalid per se, we're just trying to
> >> > give a hint to the user that their VMAs (or the userspace address they
> >> > gave us) are messed up. I don't really like lumping this in with truly
> >> > invalid arguments.
> >> 
> >> I don't mind changing the return error, but I don't think that we
> >> should have a kernel warning (pr_warn_once) for something userspace
> >> can trigger.
> >
> > This isn't a WARN, e.g. won't trip panic_on_warn.  In practice, it's not
> > meaningfully different than pr_info().  That said, I agree that printing anything
> > is a bad approach.
> >
> >> It's not an IO error either. I think that this is an invalid argument
> >> (EINVAL).
> >
> > I agree with James, this isn't an invalid argument.  Having the validity of an
> > input hinge on the ordering between a KVM ioctl() and mmap() is quite odd.  I
> > know KVM arm64 does exactly this for KVM_SET_USER_MEMORY_REGION{,2}, but I don't
> > love the semantics.  And unlike that scenario, where e.g. MTE tags are verified
> > again at fault-time, KVM won't re-check the VMA when accessing guest memory via
> > the userspace mapping, e.g. through uaccess.
> >
> > Unless I'm forgetting something, I'm leaning toward omitting the check entirely.
> >
> 
> I'm good with dropping this patch. I might have misunderstood the conclusion
> of the guest_memfd call.

No, I don't think you misunderstood anything.  It's just that sometimes opinions
different when there's actual code, versus a verbal discussion.  I.e. this sounds
like a good idea, but when seeing the code and thinking through the effects, it's
less appealing.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-13 16:34 ` [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd Fuad Tabba
  2025-05-14  7:13   ` Shivank Garg
@ 2025-05-14 15:27   ` kernel test robot
  2025-05-21  8:01   ` David Hildenbrand
  2 siblings, 0 replies; 88+ messages in thread
From: kernel test robot @ 2025-05-14 15:27 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: oe-kbuild-all, pbonzini, chenhuacai, mpe, anup, paul.walmsley,
	palmer, aou, seanjc, viro, brauner, willy, akpm, xiaoyao.li,
	yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata,
	mic, vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang

Hi Fuad,

kernel test robot noticed the following build errors:

[auto build test ERROR on 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3]

url:    https://github.com/intel-lab-lkp/linux/commits/Fuad-Tabba/KVM-Rename-CONFIG_KVM_PRIVATE_MEM-to-CONFIG_KVM_GMEM/20250514-003900
base:   82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
patch link:    https://lore.kernel.org/r/20250513163438.3942405-11-tabba%40google.com
patch subject: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
config: x86_64-buildonly-randconfig-002-20250514 (https://download.01.org/0day-ci/archive/20250514/202505142334.6dQb5Sei-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250514/202505142334.6dQb5Sei-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505142334.6dQb5Sei-lkp@intel.com/

All errors (new ones prefixed by >>):

   arch/x86/kvm/mmu/mmu.c: In function 'kvm_mmu_max_mapping_level':
>> arch/x86/kvm/mmu/mmu.c:3315:14: error: implicit declaration of function 'kvm_get_memory_attributes' [-Werror=implicit-function-declaration]
    3315 |              kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
         |              ^~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/kvm_get_memory_attributes +3315 arch/x86/kvm/mmu/mmu.c

  3303	
  3304	int kvm_mmu_max_mapping_level(struct kvm *kvm,
  3305				      const struct kvm_memory_slot *slot, gfn_t gfn)
  3306	{
  3307		int max_level;
  3308	
  3309		max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM);
  3310		if (max_level == PG_LEVEL_4K)
  3311			return PG_LEVEL_4K;
  3312	
  3313		if (kvm_slot_has_gmem(slot) &&
  3314		    (kvm_gmem_memslot_supports_shared(slot) ||
> 3315		     kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
  3316			return kvm_gmem_max_mapping_level(slot, gfn, max_level);
  3317		}
  3318	
  3319		return min(max_level, host_pfn_mapping_level(kvm, gfn, slot));
  3320	}
  3321	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range
  2025-05-13 16:34 ` [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range Fuad Tabba
  2025-05-13 20:30   ` James Houghton
@ 2025-05-14 17:39   ` David Hildenbrand
  1 sibling, 0 replies; 88+ messages in thread
From: David Hildenbrand @ 2025-05-14 17:39 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 13.05.25 18:34, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> On binding of a guest_memfd with a memslot, check that the slot's
> userspace_addr and the requested fd and offset refer to the same memory
> range.
> 
> This check is best-effort: nothing prevents userspace from later mapping
> other memory to the same provided in slot->userspace_addr and breaking
> guest operation.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   virt/kvm/guest_memfd.c | 37 ++++++++++++++++++++++++++++++++++---
>   1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 8e6d1866b55e..2f499021df66 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>   	return __kvm_gmem_create(kvm, size, flags);
>   }
>   
> +static bool kvm_gmem_is_same_range(struct kvm *kvm,
> +				   struct kvm_memory_slot *slot,
> +				   struct file *file, loff_t offset)
> +{
> +	struct mm_struct *mm = kvm->mm;
> +	loff_t userspace_addr_offset;
> +	struct vm_area_struct *vma;
> +	bool ret = false;
> +
> +	mmap_read_lock(mm);
> +
> +	vma = vma_lookup(mm, slot->userspace_addr);
> +	if (!vma)
> +		goto out;
> +
> +	if (vma->vm_file != file)
> +		goto out;
> +
> +	userspace_addr_offset = slot->userspace_addr - vma->vm_start;
> +	ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset;

You'd probably have to iterate over the whole range (which might span 
multiple VMAs), but reading the discussion, I'm fine with dropping this 
patch for now.

I think it's more important to document that thoroughly: what does it 
mean when we use GUEST_MEMFD_FLAG_SUPPORT_SHARED and then pass that fd 
in a memslot.

Skimming over patch #15, I assume this is properly documented in there.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-13 16:34 ` [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
                     ` (2 preceding siblings ...)
  2025-05-14 10:07   ` Roy, Patrick
@ 2025-05-14 20:40   ` James Houghton
  2025-05-15  7:25     ` Fuad Tabba
  2025-05-15 23:42   ` Gavin Shan
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 88+ messages in thread
From: James Houghton @ 2025-05-14 20:40 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

,

On Tue, May 13, 2025 at 9:34 AM Fuad Tabba <tabba@google.com> wrote:
>
> This patch enables support for shared memory in guest_memfd, including
> mapping that memory at the host userspace. This support is gated by the
> configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> guest_memfd instance.
>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 10 ++++
>  include/linux/kvm_host.h        | 13 +++++
>  include/uapi/linux/kvm.h        |  1 +
>  virt/kvm/Kconfig                |  5 ++
>  virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
>  5 files changed, 117 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 709cc2a7ba66..f72722949cae 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>
>  #ifdef CONFIG_KVM_GMEM
>  #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> +
> +/*
> + * CoCo VMs with hardware support that use guest_memfd only for backing private
> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> + */
> +#define kvm_arch_vm_supports_gmem_shared_mem(kvm)                      \
> +       (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&                      \
> +        ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||             \
> +         (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
>  #else
>  #define kvm_arch_supports_gmem(kvm) false
> +#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false
>  #endif
>
>  #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ae70e4e19700..2ec89c214978 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>  }
>  #endif
>
> +/*
> + * Returns true if this VM supports shared mem in guest_memfd.
> + *
> + * Arch code must define kvm_arch_vm_supports_gmem_shared_mem if support for
> + * guest_memfd is enabled.
> + */
> +#if !defined(kvm_arch_vm_supports_gmem_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> +{
> +       return false;
> +}
> +#endif
> +
>  #ifndef kvm_arch_has_readonly_mem
>  static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
>  {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b6ae8ad8934b..9857022a0f0c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>
>  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED        (1UL << 0)
>
>  struct kvm_create_guest_memfd {
>         __u64 size;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 559c93ad90be..f4e469a62a60 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
>  config HAVE_KVM_ARCH_GMEM_INVALIDATE
>         bool
>         depends on KVM_GMEM
> +
> +config KVM_GMEM_SHARED_MEM
> +       select KVM_GMEM
> +       bool
> +       prompt "Enables in-place shared memory for guest_memfd"
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 6db515833f61..8e6d1866b55e 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>         return gfn - slot->base_gfn + slot->gmem.pgoff;
>  }
>
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +
> +static bool kvm_gmem_supports_shared(struct inode *inode)
> +{
> +       uint64_t flags = (uint64_t)inode->i_private;
> +
> +       return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +}
> +
> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> +{
> +       struct inode *inode = file_inode(vmf->vma->vm_file);
> +       struct folio *folio;
> +       vm_fault_t ret = VM_FAULT_LOCKED;
> +
> +       filemap_invalidate_lock_shared(inode->i_mapping);
> +
> +       folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> +       if (IS_ERR(folio)) {
> +               int err = PTR_ERR(folio);
> +
> +               if (err == -EAGAIN)
> +                       ret = VM_FAULT_RETRY;
> +               else
> +                       ret = vmf_error(err);
> +
> +               goto out_filemap;
> +       }
> +
> +       if (folio_test_hwpoison(folio)) {
> +               ret = VM_FAULT_HWPOISON;
> +               goto out_folio;
> +       }
> +
> +       if (WARN_ON_ONCE(folio_test_large(folio))) {
> +               ret = VM_FAULT_SIGBUS;
> +               goto out_folio;
> +       }
> +
> +       if (!folio_test_uptodate(folio)) {
> +               clear_highpage(folio_page(folio, 0));
> +               kvm_gmem_mark_prepared(folio);
> +       }
> +
> +       vmf->page = folio_file_page(folio, vmf->pgoff);
> +
> +out_folio:
> +       if (ret != VM_FAULT_LOCKED) {
> +               folio_unlock(folio);
> +               folio_put(folio);
> +       }
> +
> +out_filemap:
> +       filemap_invalidate_unlock_shared(inode->i_mapping);
> +
> +       return ret;
> +}
> +
> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> +       .fault = kvm_gmem_fault_shared,
> +};
> +
> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +       if (!kvm_gmem_supports_shared(file_inode(file)))
> +               return -ENODEV;
> +
> +       if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> +           (VM_SHARED | VM_MAYSHARE)) {
> +               return -EINVAL;
> +       }
> +
> +       vma->vm_ops = &kvm_gmem_vm_ops;
> +
> +       return 0;
> +}
> +#else
> +#define kvm_gmem_mmap NULL
> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> +
>  static struct file_operations kvm_gmem_fops = {
> +       .mmap           = kvm_gmem_mmap,
>         .open           = generic_file_open,
>         .release        = kvm_gmem_release,
>         .fallocate      = kvm_gmem_fallocate,
> @@ -463,6 +544,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>         u64 flags = args->flags;
>         u64 valid_flags = 0;
>
> +       if (kvm_arch_vm_supports_gmem_shared_mem(kvm))
> +               valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +
>         if (flags & ~valid_flags)
>                 return -EINVAL;
>
> @@ -501,6 +585,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
>             offset + size > i_size_read(inode))
>                 goto err;
>
> +       if (kvm_gmem_supports_shared(inode) &&

When building without CONFIG_KVM_GMEM_SHARED_MEM, my compiler
complains that kvm_gmem_supports_shared() is not defined.

> +           !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> +               goto err;
> +
>         filemap_invalidate_lock(inode->i_mapping);
>
>         start = offset >> PAGE_SHIFT;
> --
> 2.49.0.1045.g170613ef41-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 13/17] KVM: arm64: Handle guest_memfd()-backed guest page faults
  2025-05-13 16:34 ` [PATCH v9 13/17] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
@ 2025-05-14 21:26   ` James Houghton
  2025-05-15  9:27     ` Fuad Tabba
  2025-05-21  8:04   ` David Hildenbrand
  1 sibling, 1 reply; 88+ messages in thread
From: James Houghton @ 2025-05-14 21:26 UTC (permalink / raw)
  To: tabba
  Cc: ackerleytng, akpm, amoorthy, anup, aou, brauner, catalin.marinas,
	chao.p.peng, chenhuacai, david, dmatlack, fvdl, hch, hughd,
	ira.weiny, isaku.yamahata, isaku.yamahata, james.morse, jarkko,
	jgg, jhubbard, jthoughton, keirf, kirill.shutemov, kvm,
	liam.merwick, linux-arm-msm, linux-mm, mail, maz, mic,
	michael.roth, mpe, oliver.upton, palmer, pankaj.gupta,
	paul.walmsley, pbonzini, peterx, qperret, quic_cvanscha,
	quic_eberman, quic_mnalajal, quic_pderrin, quic_pheragu,
	quic_svaddagi, quic_tsoni, rientjes, roypat, seanjc, shuah,
	steven.price, suzuki.poulose, vannapurve, vbabka, viro,
	wei.w.wang, will, willy, xiaoyao.li, yilun.xu, yuzenghui

On Tue, May 13, 2025 at 9:35 AM Fuad Tabba <tabba@google.com> wrote:
>
> Add arm64 support for handling guest page faults on guest_memfd
> backed memslots.
>
> For now, the fault granule is restricted to PAGE_SIZE.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/kvm/mmu.c     | 94 +++++++++++++++++++++++++---------------
>  include/linux/kvm_host.h |  5 +++
>  virt/kvm/kvm_main.c      |  5 ---
>  3 files changed, 64 insertions(+), 40 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d756c2b5913f..9a48ef08491d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1466,6 +1466,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
>         return vma->vm_flags & VM_MTE_ALLOWED;
>  }
>
> +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> +                            gfn_t gfn, bool write_fault, bool *writable,
> +                            struct page **page, bool is_gmem)
> +{
> +       kvm_pfn_t pfn;
> +       int ret;
> +
> +       if (!is_gmem)
> +               return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
> +
> +       *writable = false;
> +
> +       ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
> +       if (!ret) {
> +               *writable = !memslot_is_readonly(slot);
> +               return pfn;
> +       }
> +
> +       if (ret == -EHWPOISON)
> +               return KVM_PFN_ERR_HWPOISON;
> +
> +       return KVM_PFN_ERR_NOSLOT_MASK;

I don't think the above handling for the `ret != 0` case is correct. I think
we should just be returning `ret` out to userspace.

The diff I have below is closer to what I think we must do.

> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                           struct kvm_s2_trans *nested,
>                           struct kvm_memory_slot *memslot, unsigned long hva,
> @@ -1473,19 +1497,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  {
>         int ret = 0;
>         bool write_fault, writable;
> -       bool exec_fault, mte_allowed;
> +       bool exec_fault, mte_allowed = false;
>         bool device = false, vfio_allow_any_uc = false;
>         unsigned long mmu_seq;
>         phys_addr_t ipa = fault_ipa;
>         struct kvm *kvm = vcpu->kvm;
> -       struct vm_area_struct *vma;
> -       short page_shift;
> +       struct vm_area_struct *vma = NULL;
> +       short page_shift = PAGE_SHIFT;
>         void *memcache;
> -       gfn_t gfn;
> +       gfn_t gfn = ipa >> PAGE_SHIFT;
>         kvm_pfn_t pfn;
>         bool logging_active = memslot_is_logging(memslot);
> -       bool force_pte = logging_active || is_protected_kvm_enabled();
> -       long page_size, fault_granule;
> +       bool is_gmem = kvm_slot_has_gmem(memslot);
> +       bool force_pte = logging_active || is_gmem || is_protected_kvm_enabled();
> +       long page_size, fault_granule = PAGE_SIZE;
>         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>         struct kvm_pgtable *pgt;
>         struct page *page;
> @@ -1529,17 +1554,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          * Let's check if we will get back a huge page backed by hugetlbfs, or
>          * get block mapping for device MMIO region.
>          */
> -       mmap_read_lock(current->mm);
> -       vma = vma_lookup(current->mm, hva);
> -       if (unlikely(!vma)) {
> -               kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> -               mmap_read_unlock(current->mm);
> -               return -EFAULT;
> +       if (!is_gmem) {
> +               mmap_read_lock(current->mm);
> +               vma = vma_lookup(current->mm, hva);
> +               if (unlikely(!vma)) {
> +                       kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> +                       mmap_read_unlock(current->mm);
> +                       return -EFAULT;
> +               }
> +
> +               vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> +               mte_allowed = kvm_vma_mte_allowed(vma);
>         }
>
> -       if (force_pte)
> -               page_shift = PAGE_SHIFT;
> -       else
> +       if (!force_pte)
>                 page_shift = get_vma_page_shift(vma, hva);
>
>         switch (page_shift) {
> @@ -1605,27 +1633,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                 ipa &= ~(page_size - 1);
>         }
>
> -       gfn = ipa >> PAGE_SHIFT;
> -       mte_allowed = kvm_vma_mte_allowed(vma);
> -
> -       vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> -
> -       /* Don't use the VMA after the unlock -- it may have vanished */
> -       vma = NULL;
> +       if (!is_gmem) {
> +               /* Don't use the VMA after the unlock -- it may have vanished */
> +               vma = NULL;

I think we can just move the vma declaration inside the earlier `if (is_gmem)`
bit above. It should be really hard to accidentally attempt to use `vma` or
`hva` in the is_gmem case. `vma` we can easily make it impossible; `hva` is
harder.

See below for what I think this should look like.

>
> -       /*
> -        * Read mmu_invalidate_seq so that KVM can detect if the results of
> -        * vma_lookup() or __kvm_faultin_pfn() become stale prior to
> -        * acquiring kvm->mmu_lock.
> -        *
> -        * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
> -        * with the smp_wmb() in kvm_mmu_invalidate_end().
> -        */
> -       mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> -       mmap_read_unlock(current->mm);
> +               /*
> +                * Read mmu_invalidate_seq so that KVM can detect if the results
> +                * of vma_lookup() or faultin_pfn() become stale prior to
> +                * acquiring kvm->mmu_lock.
> +                *
> +                * Rely on mmap_read_unlock() for an implicit smp_rmb(), which
> +                * pairs with the smp_wmb() in kvm_mmu_invalidate_end().
> +                */
> +               mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> +               mmap_read_unlock(current->mm);
> +       }
>
> -       pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
> -                               &writable, &page);
> +       pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
>         if (pfn == KVM_PFN_ERR_HWPOISON) {
>                 kvm_send_hwpoison_signal(hva, page_shift);

`hva` is used here even for the is_gmem case, and that should be slightly
concerning. And indeed it is, this is not the appropriate way to handle
hwpoison for gmem (and it is different than the behavior you have for x86). x86
handles this by returning a KVM_MEMORY_FAULT_EXIT to userspace; we should do
the same.

I've put what I think is more appropriate in the diff below.

And just to be clear, IMO, we *cannot* do what you have written now, especially
given that we are getting rid of the userspace_addr sanity check (but that
check was best-effort anyway).

>                 return 0;
> @@ -1677,7 +1701,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
>         kvm_fault_lock(kvm);
>         pgt = vcpu->arch.hw_mmu->pgt;
> -       if (mmu_invalidate_retry(kvm, mmu_seq)) {
> +       if (!is_gmem && mmu_invalidate_retry(kvm, mmu_seq)) {
>                 ret = -EAGAIN;
>                 goto out_unlock;
>         }
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f9bb025327c3..b317392453a5 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1884,6 +1884,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
>         return gfn_to_memslot(kvm, gfn)->id;
>  }
>
> +static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot)
> +{
> +       return slot->flags & KVM_MEM_READONLY;
> +}

I think if you're going to move this helper to include/linux/kvm_host.h, you
might want to do so in its own patch and change all of the existing places
where we check KVM_MEM_READONLY directly. *shrug*

> +
>  static inline gfn_t
>  hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
>  {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6289ea1685dd..6261d8638cd2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2640,11 +2640,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
>         return size;
>  }
>
> -static bool memslot_is_readonly(const struct kvm_memory_slot *slot)
> -{
> -       return slot->flags & KVM_MEM_READONLY;
> -}
> -
>  static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn,
>                                        gfn_t *nr_pages, bool write)
>  {
> --
> 2.49.0.1045.g170613ef41-goog
>

Alright, here's the diff I have in mind:

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 9a48ef08491db..74eae19792373 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1466,28 +1466,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
-static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-			     gfn_t gfn, bool write_fault, bool *writable,
-			     struct page **page, bool is_gmem)
+static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_vcpu *vcpu,
+			     struct kvm_memory_slot *slot, gfn_t gfn,
+			     bool exec_fault, bool write_fault, bool *writable,
+			     struct page **page, bool is_gmem, kvm_pfn_t *pfn)
 {
-	kvm_pfn_t pfn;
 	int ret;
 
-	if (!is_gmem)
-		return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
+	if (!is_gmem) {
+		*pfn = __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
+		return 0;
+	}
 
 	*writable = false;
 
-	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
-	if (!ret) {
-		*writable = !memslot_is_readonly(slot);
-		return pfn;
+	ret = kvm_gmem_get_pfn(kvm, slot, gfn, pfn, page, NULL);
+	if (ret) {
+		kvm_prepare_memory_fault_exit(vcpu, gfn << PAGE_SHIFT,
+					      PAGE_SIZE, write_fault,
+					      exec_fault, false);
+		return ret;
 	}
 
-	if (ret == -EHWPOISON)
-		return KVM_PFN_ERR_HWPOISON;
-
-	return KVM_PFN_ERR_NOSLOT_MASK;
+	*writable = !memslot_is_readonly(slot);
+	return 0;
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
@@ -1502,7 +1504,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	unsigned long mmu_seq;
 	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
-	struct vm_area_struct *vma = NULL;
 	short page_shift = PAGE_SHIFT;
 	void *memcache;
 	gfn_t gfn = ipa >> PAGE_SHIFT;
@@ -1555,6 +1556,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * get block mapping for device MMIO region.
 	 */
 	if (!is_gmem) {
+		struct vm_area_struct *vma = NULL;
+
 		mmap_read_lock(current->mm);
 		vma = vma_lookup(current->mm, hva);
 		if (unlikely(!vma)) {
@@ -1565,33 +1568,44 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 		vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
 		mte_allowed = kvm_vma_mte_allowed(vma);
-	}
 
-	if (!force_pte)
-		page_shift = get_vma_page_shift(vma, hva);
+		if (!force_pte)
+			page_shift = get_vma_page_shift(vma, hva);
+
+		/*
+		 * Read mmu_invalidate_seq so that KVM can detect if the results
+		 * of vma_lookup() or faultin_pfn() become stale prior to
+		 * acquiring kvm->mmu_lock.
+		 *
+		 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which
+		 * pairs with the smp_wmb() in kvm_mmu_invalidate_end().
+		 */
+		mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+		mmap_read_unlock(current->mm);
 
-	switch (page_shift) {
+		switch (page_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
-	case PUD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
-			break;
-		fallthrough;
+		case PUD_SHIFT:
+			if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
+				break;
+			fallthrough;
 #endif
-	case CONT_PMD_SHIFT:
-		page_shift = PMD_SHIFT;
-		fallthrough;
-	case PMD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
+		case CONT_PMD_SHIFT:
+			page_shift = PMD_SHIFT;
+			fallthrough;
+		case PMD_SHIFT:
+			if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
+				break;
+			fallthrough;
+		case CONT_PTE_SHIFT:
+			page_shift = PAGE_SHIFT;
+			force_pte = true;
+			fallthrough;
+		case PAGE_SHIFT:
 			break;
-		fallthrough;
-	case CONT_PTE_SHIFT:
-		page_shift = PAGE_SHIFT;
-		force_pte = true;
-		fallthrough;
-	case PAGE_SHIFT:
-		break;
-	default:
-		WARN_ONCE(1, "Unknown page_shift %d", page_shift);
+		default:
+			WARN_ONCE(1, "Unknown page_shift %d", page_shift);
+		}
 	}
 
 	page_size = 1UL << page_shift;
@@ -1633,24 +1647,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ipa &= ~(page_size - 1);
 	}
 
-	if (!is_gmem) {
-		/* Don't use the VMA after the unlock -- it may have vanished */
-		vma = NULL;
-
+	ret = faultin_pfn(kvm, vcpu, memslot, gfn, exec_fault, write_fault,
+			  &writable, &page, is_gmem, &pfn);
+	if (ret)
+		return ret;
+	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		/*
-		 * Read mmu_invalidate_seq so that KVM can detect if the results
-		 * of vma_lookup() or faultin_pfn() become stale prior to
-		 * acquiring kvm->mmu_lock.
-		 *
-		 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which
-		 * pairs with the smp_wmb() in kvm_mmu_invalidate_end().
+		 * For gmem, hwpoison should be communicated via a memory fault
+		 * exit, not via a SIGBUS.
 		 */
-		mmu_seq = vcpu->kvm->mmu_invalidate_seq;
-		mmap_read_unlock(current->mm);
-	}
-
-	pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
-	if (pfn == KVM_PFN_ERR_HWPOISON) {
+		WARN_ON_ONCE(is_gmem);
 		kvm_send_hwpoison_signal(hva, page_shift);
 		return 0;
 	}


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-14 20:40   ` James Houghton
@ 2025-05-15  7:25     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-15  7:25 UTC (permalink / raw)
  To: James Houghton
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

On Wed, 14 May 2025 at 22:40, James Houghton <jthoughton@google.com> wrote:
>
> ,
>
> On Tue, May 13, 2025 at 9:34 AM Fuad Tabba <tabba@google.com> wrote:
> >
> > This patch enables support for shared memory in guest_memfd, including
> > mapping that memory at the host userspace. This support is gated by the
> > configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> > flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> > guest_memfd instance.
> >
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h | 10 ++++
> >  include/linux/kvm_host.h        | 13 +++++
> >  include/uapi/linux/kvm.h        |  1 +
> >  virt/kvm/Kconfig                |  5 ++
> >  virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
> >  5 files changed, 117 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 709cc2a7ba66..f72722949cae 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> >
> >  #ifdef CONFIG_KVM_GMEM
> >  #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> > +
> > +/*
> > + * CoCo VMs with hardware support that use guest_memfd only for backing private
> > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> > + */
> > +#define kvm_arch_vm_supports_gmem_shared_mem(kvm)                      \
> > +       (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&                      \
> > +        ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||             \
> > +         (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
> >  #else
> >  #define kvm_arch_supports_gmem(kvm) false
> > +#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false
> >  #endif
> >
> >  #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index ae70e4e19700..2ec89c214978 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> >  }
> >  #endif
> >
> > +/*
> > + * Returns true if this VM supports shared mem in guest_memfd.
> > + *
> > + * Arch code must define kvm_arch_vm_supports_gmem_shared_mem if support for
> > + * guest_memfd is enabled.
> > + */
> > +#if !defined(kvm_arch_vm_supports_gmem_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> > +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> > +{
> > +       return false;
> > +}
> > +#endif
> > +
> >  #ifndef kvm_arch_has_readonly_mem
> >  static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
> >  {
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index b6ae8ad8934b..9857022a0f0c 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
> >  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
> >
> >  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> > +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED        (1UL << 0)
> >
> >  struct kvm_create_guest_memfd {
> >         __u64 size;
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index 559c93ad90be..f4e469a62a60 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
> >  config HAVE_KVM_ARCH_GMEM_INVALIDATE
> >         bool
> >         depends on KVM_GMEM
> > +
> > +config KVM_GMEM_SHARED_MEM
> > +       select KVM_GMEM
> > +       bool
> > +       prompt "Enables in-place shared memory for guest_memfd"
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 6db515833f61..8e6d1866b55e 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> >         return gfn - slot->base_gfn + slot->gmem.pgoff;
> >  }
> >
> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > +
> > +static bool kvm_gmem_supports_shared(struct inode *inode)
> > +{
> > +       uint64_t flags = (uint64_t)inode->i_private;
> > +
> > +       return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +}
> > +
> > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> > +{
> > +       struct inode *inode = file_inode(vmf->vma->vm_file);
> > +       struct folio *folio;
> > +       vm_fault_t ret = VM_FAULT_LOCKED;
> > +
> > +       filemap_invalidate_lock_shared(inode->i_mapping);
> > +
> > +       folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> > +       if (IS_ERR(folio)) {
> > +               int err = PTR_ERR(folio);
> > +
> > +               if (err == -EAGAIN)
> > +                       ret = VM_FAULT_RETRY;
> > +               else
> > +                       ret = vmf_error(err);
> > +
> > +               goto out_filemap;
> > +       }
> > +
> > +       if (folio_test_hwpoison(folio)) {
> > +               ret = VM_FAULT_HWPOISON;
> > +               goto out_folio;
> > +       }
> > +
> > +       if (WARN_ON_ONCE(folio_test_large(folio))) {
> > +               ret = VM_FAULT_SIGBUS;
> > +               goto out_folio;
> > +       }
> > +
> > +       if (!folio_test_uptodate(folio)) {
> > +               clear_highpage(folio_page(folio, 0));
> > +               kvm_gmem_mark_prepared(folio);
> > +       }
> > +
> > +       vmf->page = folio_file_page(folio, vmf->pgoff);
> > +
> > +out_folio:
> > +       if (ret != VM_FAULT_LOCKED) {
> > +               folio_unlock(folio);
> > +               folio_put(folio);
> > +       }
> > +
> > +out_filemap:
> > +       filemap_invalidate_unlock_shared(inode->i_mapping);
> > +
> > +       return ret;
> > +}
> > +
> > +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> > +       .fault = kvm_gmem_fault_shared,
> > +};
> > +
> > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +       if (!kvm_gmem_supports_shared(file_inode(file)))
> > +               return -ENODEV;
> > +
> > +       if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> > +           (VM_SHARED | VM_MAYSHARE)) {
> > +               return -EINVAL;
> > +       }
> > +
> > +       vma->vm_ops = &kvm_gmem_vm_ops;
> > +
> > +       return 0;
> > +}
> > +#else
> > +#define kvm_gmem_mmap NULL
> > +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> > +
> >  static struct file_operations kvm_gmem_fops = {
> > +       .mmap           = kvm_gmem_mmap,
> >         .open           = generic_file_open,
> >         .release        = kvm_gmem_release,
> >         .fallocate      = kvm_gmem_fallocate,
> > @@ -463,6 +544,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> >         u64 flags = args->flags;
> >         u64 valid_flags = 0;
> >
> > +       if (kvm_arch_vm_supports_gmem_shared_mem(kvm))
> > +               valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +
> >         if (flags & ~valid_flags)
> >                 return -EINVAL;
> >
> > @@ -501,6 +585,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> >             offset + size > i_size_read(inode))
> >                 goto err;
> >
> > +       if (kvm_gmem_supports_shared(inode) &&
>
> When building without CONFIG_KVM_GMEM_SHARED_MEM, my compiler
> complains that kvm_gmem_supports_shared() is not defined.

Thanks for pointing that out. I'll fix this.
/fuad
>
> > +           !kvm_arch_vm_supports_gmem_shared_mem(kvm))
> > +               goto err;
> > +
> >         filemap_invalidate_lock(inode->i_mapping);
> >
> >         start = offset >> PAGE_SHIFT;
> > --
> > 2.49.0.1045.g170613ef41-goog
> >


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 13/17] KVM: arm64: Handle guest_memfd()-backed guest page faults
  2025-05-14 21:26   ` James Houghton
@ 2025-05-15  9:27     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-15  9:27 UTC (permalink / raw)
  To: James Houghton
  Cc: ackerleytng, akpm, amoorthy, anup, aou, brauner, catalin.marinas,
	chao.p.peng, chenhuacai, david, dmatlack, fvdl, hch, hughd,
	ira.weiny, isaku.yamahata, isaku.yamahata, james.morse, jarkko,
	jgg, jhubbard, keirf, kirill.shutemov, kvm, liam.merwick,
	linux-arm-msm, linux-mm, mail, maz, mic, michael.roth, mpe,
	oliver.upton, palmer, pankaj.gupta, paul.walmsley, pbonzini,
	peterx, qperret, quic_cvanscha, quic_eberman, quic_mnalajal,
	quic_pderrin, quic_pheragu, quic_svaddagi, quic_tsoni, rientjes,
	roypat, seanjc, shuah, steven.price, suzuki.poulose, vannapurve,
	vbabka, viro, wei.w.wang, will, willy, xiaoyao.li, yilun.xu,
	yuzenghui

Hi James,

On Wed, 14 May 2025 at 23:26, James Houghton <jthoughton@google.com> wrote:
>
> On Tue, May 13, 2025 at 9:35 AM Fuad Tabba <tabba@google.com> wrote:
> >
> > Add arm64 support for handling guest page faults on guest_memfd
> > backed memslots.
> >
> > For now, the fault granule is restricted to PAGE_SIZE.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/kvm/mmu.c     | 94 +++++++++++++++++++++++++---------------
> >  include/linux/kvm_host.h |  5 +++
> >  virt/kvm/kvm_main.c      |  5 ---
> >  3 files changed, 64 insertions(+), 40 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index d756c2b5913f..9a48ef08491d 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1466,6 +1466,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
> >         return vma->vm_flags & VM_MTE_ALLOWED;
> >  }
> >
> > +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> > +                            gfn_t gfn, bool write_fault, bool *writable,
> > +                            struct page **page, bool is_gmem)
> > +{
> > +       kvm_pfn_t pfn;
> > +       int ret;
> > +
> > +       if (!is_gmem)
> > +               return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
> > +
> > +       *writable = false;
> > +
> > +       ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
> > +       if (!ret) {
> > +               *writable = !memslot_is_readonly(slot);
> > +               return pfn;
> > +       }
> > +
> > +       if (ret == -EHWPOISON)
> > +               return KVM_PFN_ERR_HWPOISON;
> > +
> > +       return KVM_PFN_ERR_NOSLOT_MASK;
>
> I don't think the above handling for the `ret != 0` case is correct. I think
> we should just be returning `ret` out to userspace.

Ack.

>
> The diff I have below is closer to what I think we must do.
>
> > +}
> > +
> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                           struct kvm_s2_trans *nested,
> >                           struct kvm_memory_slot *memslot, unsigned long hva,
> > @@ -1473,19 +1497,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >  {
> >         int ret = 0;
> >         bool write_fault, writable;
> > -       bool exec_fault, mte_allowed;
> > +       bool exec_fault, mte_allowed = false;
> >         bool device = false, vfio_allow_any_uc = false;
> >         unsigned long mmu_seq;
> >         phys_addr_t ipa = fault_ipa;
> >         struct kvm *kvm = vcpu->kvm;
> > -       struct vm_area_struct *vma;
> > -       short page_shift;
> > +       struct vm_area_struct *vma = NULL;
> > +       short page_shift = PAGE_SHIFT;
> >         void *memcache;
> > -       gfn_t gfn;
> > +       gfn_t gfn = ipa >> PAGE_SHIFT;
> >         kvm_pfn_t pfn;
> >         bool logging_active = memslot_is_logging(memslot);
> > -       bool force_pte = logging_active || is_protected_kvm_enabled();
> > -       long page_size, fault_granule;
> > +       bool is_gmem = kvm_slot_has_gmem(memslot);
> > +       bool force_pte = logging_active || is_gmem || is_protected_kvm_enabled();
> > +       long page_size, fault_granule = PAGE_SIZE;
> >         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> >         struct kvm_pgtable *pgt;
> >         struct page *page;
> > @@ -1529,17 +1554,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >          * Let's check if we will get back a huge page backed by hugetlbfs, or
> >          * get block mapping for device MMIO region.
> >          */
> > -       mmap_read_lock(current->mm);
> > -       vma = vma_lookup(current->mm, hva);
> > -       if (unlikely(!vma)) {
> > -               kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> > -               mmap_read_unlock(current->mm);
> > -               return -EFAULT;
> > +       if (!is_gmem) {
> > +               mmap_read_lock(current->mm);
> > +               vma = vma_lookup(current->mm, hva);
> > +               if (unlikely(!vma)) {
> > +                       kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> > +                       mmap_read_unlock(current->mm);
> > +                       return -EFAULT;
> > +               }
> > +
> > +               vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> > +               mte_allowed = kvm_vma_mte_allowed(vma);
> >         }
> >
> > -       if (force_pte)
> > -               page_shift = PAGE_SHIFT;
> > -       else
> > +       if (!force_pte)
> >                 page_shift = get_vma_page_shift(vma, hva);
> >
> >         switch (page_shift) {
> > @@ -1605,27 +1633,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                 ipa &= ~(page_size - 1);
> >         }
> >
> > -       gfn = ipa >> PAGE_SHIFT;
> > -       mte_allowed = kvm_vma_mte_allowed(vma);
> > -
> > -       vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> > -
> > -       /* Don't use the VMA after the unlock -- it may have vanished */
> > -       vma = NULL;
> > +       if (!is_gmem) {
> > +               /* Don't use the VMA after the unlock -- it may have vanished */
> > +               vma = NULL;
>
> I think we can just move the vma declaration inside the earlier `if (is_gmem)`
> bit above. It should be really hard to accidentally attempt to use `vma` or
> `hva` in the is_gmem case. `vma` we can easily make it impossible; `hva` is
> harder.

To be honest, I think we need to refactor user_mem_abort(). It's
already a bit messy, and with the guest_memfd code, and in the
(hopefully) soon, pkvm code, it's going to get messier. Some of the
things things to keep in mind are, like you suggest, ensuring that vma
and hva aren't in scope where they're not needed.

>
> See below for what I think this should look like.
>
> >
> > -       /*
> > -        * Read mmu_invalidate_seq so that KVM can detect if the results of
> > -        * vma_lookup() or __kvm_faultin_pfn() become stale prior to
> > -        * acquiring kvm->mmu_lock.
> > -        *
> > -        * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
> > -        * with the smp_wmb() in kvm_mmu_invalidate_end().
> > -        */
> > -       mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> > -       mmap_read_unlock(current->mm);
> > +               /*
> > +                * Read mmu_invalidate_seq so that KVM can detect if the results
> > +                * of vma_lookup() or faultin_pfn() become stale prior to
> > +                * acquiring kvm->mmu_lock.
> > +                *
> > +                * Rely on mmap_read_unlock() for an implicit smp_rmb(), which
> > +                * pairs with the smp_wmb() in kvm_mmu_invalidate_end().
> > +                */
> > +               mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> > +               mmap_read_unlock(current->mm);
> > +       }
> >
> > -       pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
> > -                               &writable, &page);
> > +       pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
> >         if (pfn == KVM_PFN_ERR_HWPOISON) {
> >                 kvm_send_hwpoison_signal(hva, page_shift);
>
> `hva` is used here even for the is_gmem case, and that should be slightly
> concerning. And indeed it is, this is not the appropriate way to handle
> hwpoison for gmem (and it is different than the behavior you have for x86). x86
> handles this by returning a KVM_MEMORY_FAULT_EXIT to userspace; we should do
> the same.

You're right. My initial thought was that by having a best-effort
check that that would be enough, and not change the arm64 behavior all
that much. Exiting to userspace is cleaner.

> I've put what I think is more appropriate in the diff below.
>
> And just to be clear, IMO, we *cannot* do what you have written now, especially
> given that we are getting rid of the userspace_addr sanity check (but that
> check was best-effort anyway).
>
> >                 return 0;
> > @@ -1677,7 +1701,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >
> >         kvm_fault_lock(kvm);
> >         pgt = vcpu->arch.hw_mmu->pgt;
> > -       if (mmu_invalidate_retry(kvm, mmu_seq)) {
> > +       if (!is_gmem && mmu_invalidate_retry(kvm, mmu_seq)) {
> >                 ret = -EAGAIN;
> >                 goto out_unlock;
> >         }
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index f9bb025327c3..b317392453a5 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1884,6 +1884,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
> >         return gfn_to_memslot(kvm, gfn)->id;
> >  }
> >
> > +static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot)
> > +{
> > +       return slot->flags & KVM_MEM_READONLY;
> > +}
>
> I think if you're going to move this helper to include/linux/kvm_host.h, you
> might want to do so in its own patch and change all of the existing places
> where we check KVM_MEM_READONLY directly. *shrug*

It's a tough job, but someone's gotta do it :)

>
> > +
> >  static inline gfn_t
> >  hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
> >  {
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 6289ea1685dd..6261d8638cd2 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2640,11 +2640,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
> >         return size;
> >  }
> >
> > -static bool memslot_is_readonly(const struct kvm_memory_slot *slot)
> > -{
> > -       return slot->flags & KVM_MEM_READONLY;
> > -}
> > -
> >  static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn,
> >                                        gfn_t *nr_pages, bool write)
> >  {
> > --
> > 2.49.0.1045.g170613ef41-goog
> >
>
> Alright, here's the diff I have in mind:

Thank you James.

Cheers,
/fuad


>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 9a48ef08491db..74eae19792373 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1466,28 +1466,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
>         return vma->vm_flags & VM_MTE_ALLOWED;
>  }
>
> -static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> -                            gfn_t gfn, bool write_fault, bool *writable,
> -                            struct page **page, bool is_gmem)
> +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_vcpu *vcpu,
> +                            struct kvm_memory_slot *slot, gfn_t gfn,
> +                            bool exec_fault, bool write_fault, bool *writable,
> +                            struct page **page, bool is_gmem, kvm_pfn_t *pfn)
>  {
> -       kvm_pfn_t pfn;
>         int ret;
>
> -       if (!is_gmem)
> -               return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
> +       if (!is_gmem) {
> +               *pfn = __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
> +               return 0;
> +       }
>
>         *writable = false;
>
> -       ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
> -       if (!ret) {
> -               *writable = !memslot_is_readonly(slot);
> -               return pfn;
> +       ret = kvm_gmem_get_pfn(kvm, slot, gfn, pfn, page, NULL);
> +       if (ret) {
> +               kvm_prepare_memory_fault_exit(vcpu, gfn << PAGE_SHIFT,
> +                                             PAGE_SIZE, write_fault,
> +                                             exec_fault, false);
> +               return ret;
>         }
>
> -       if (ret == -EHWPOISON)
> -               return KVM_PFN_ERR_HWPOISON;
> -
> -       return KVM_PFN_ERR_NOSLOT_MASK;
> +       *writable = !memslot_is_readonly(slot);
> +       return 0;
>  }
>
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> @@ -1502,7 +1504,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         unsigned long mmu_seq;
>         phys_addr_t ipa = fault_ipa;
>         struct kvm *kvm = vcpu->kvm;
> -       struct vm_area_struct *vma = NULL;
>         short page_shift = PAGE_SHIFT;
>         void *memcache;
>         gfn_t gfn = ipa >> PAGE_SHIFT;
> @@ -1555,6 +1556,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          * get block mapping for device MMIO region.
>          */
>         if (!is_gmem) {
> +               struct vm_area_struct *vma = NULL;
> +
>                 mmap_read_lock(current->mm);
>                 vma = vma_lookup(current->mm, hva);
>                 if (unlikely(!vma)) {
> @@ -1565,33 +1568,44 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
>                 vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
>                 mte_allowed = kvm_vma_mte_allowed(vma);
> -       }
>
> -       if (!force_pte)
> -               page_shift = get_vma_page_shift(vma, hva);
> +               if (!force_pte)
> +                       page_shift = get_vma_page_shift(vma, hva);
> +
> +               /*
> +                * Read mmu_invalidate_seq so that KVM can detect if the results
> +                * of vma_lookup() or faultin_pfn() become stale prior to
> +                * acquiring kvm->mmu_lock.
> +                *
> +                * Rely on mmap_read_unlock() for an implicit smp_rmb(), which
> +                * pairs with the smp_wmb() in kvm_mmu_invalidate_end().
> +                */
> +               mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> +               mmap_read_unlock(current->mm);
>
> -       switch (page_shift) {
> +               switch (page_shift) {
>  #ifndef __PAGETABLE_PMD_FOLDED
> -       case PUD_SHIFT:
> -               if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
> -                       break;
> -               fallthrough;
> +               case PUD_SHIFT:
> +                       if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
> +                               break;
> +                       fallthrough;
>  #endif
> -       case CONT_PMD_SHIFT:
> -               page_shift = PMD_SHIFT;
> -               fallthrough;
> -       case PMD_SHIFT:
> -               if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
> +               case CONT_PMD_SHIFT:
> +                       page_shift = PMD_SHIFT;
> +                       fallthrough;
> +               case PMD_SHIFT:
> +                       if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
> +                               break;
> +                       fallthrough;
> +               case CONT_PTE_SHIFT:
> +                       page_shift = PAGE_SHIFT;
> +                       force_pte = true;
> +                       fallthrough;
> +               case PAGE_SHIFT:
>                         break;
> -               fallthrough;
> -       case CONT_PTE_SHIFT:
> -               page_shift = PAGE_SHIFT;
> -               force_pte = true;
> -               fallthrough;
> -       case PAGE_SHIFT:
> -               break;
> -       default:
> -               WARN_ONCE(1, "Unknown page_shift %d", page_shift);
> +               default:
> +                       WARN_ONCE(1, "Unknown page_shift %d", page_shift);
> +               }
>         }
>
>         page_size = 1UL << page_shift;
> @@ -1633,24 +1647,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                 ipa &= ~(page_size - 1);
>         }
>
> -       if (!is_gmem) {
> -               /* Don't use the VMA after the unlock -- it may have vanished */
> -               vma = NULL;
> -
> +       ret = faultin_pfn(kvm, vcpu, memslot, gfn, exec_fault, write_fault,
> +                         &writable, &page, is_gmem, &pfn);
> +       if (ret)
> +               return ret;
> +       if (pfn == KVM_PFN_ERR_HWPOISON) {
>                 /*
> -                * Read mmu_invalidate_seq so that KVM can detect if the results
> -                * of vma_lookup() or faultin_pfn() become stale prior to
> -                * acquiring kvm->mmu_lock.
> -                *
> -                * Rely on mmap_read_unlock() for an implicit smp_rmb(), which
> -                * pairs with the smp_wmb() in kvm_mmu_invalidate_end().
> +                * For gmem, hwpoison should be communicated via a memory fault
> +                * exit, not via a SIGBUS.
>                  */
> -               mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> -               mmap_read_unlock(current->mm);
> -       }
> -
> -       pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
> -       if (pfn == KVM_PFN_ERR_HWPOISON) {
> +               WARN_ON_ONCE(is_gmem);
>                 kvm_send_hwpoison_signal(hva, page_shift);
>                 return 0;
>         }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-13 16:34 ` [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
                     ` (3 preceding siblings ...)
  2025-05-14 20:40   ` James Houghton
@ 2025-05-15 23:42   ` Gavin Shan
  2025-05-16  7:31     ` Fuad Tabba
  2025-05-16  6:08   ` Gavin Shan
  2025-05-21  7:41   ` David Hildenbrand
  6 siblings, 1 reply; 88+ messages in thread
From: Gavin Shan @ 2025-05-15 23:42 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

Hi Fuad,

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> This patch enables support for shared memory in guest_memfd, including
> mapping that memory at the host userspace. This support is gated by the
> configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> guest_memfd instance.
> 
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h | 10 ++++
>   include/linux/kvm_host.h        | 13 +++++
>   include/uapi/linux/kvm.h        |  1 +
>   virt/kvm/Kconfig                |  5 ++
>   virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
>   5 files changed, 117 insertions(+)
> 

[...]

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b6ae8ad8934b..9857022a0f0c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
>   #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>   
>   #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED	(1UL << 0)
>   

This would be (1ULL << 0) to be consistent with '__u64 struct kvm_create_guest_memfd::flags'

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-13 16:34 ` [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
@ 2025-05-15 23:50   ` James Houghton
  2025-05-16  7:07     ` Fuad Tabba
  2025-05-21  8:05   ` David Hildenbrand
  1 sibling, 1 reply; 88+ messages in thread
From: James Houghton @ 2025-05-15 23:50 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

On Tue, May 13, 2025 at 9:35 AM Fuad Tabba <tabba@google.com> wrote:
>
> Enable mapping guest_memfd in arm64. For now, it applies to all
> VMs in arm64 that use guest_memfd. In the future, new VM types
> can restrict this via kvm_arch_gmem_supports_shared_mem().
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
>  arch/arm64/kvm/Kconfig            |  1 +
>  2 files changed, 11 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 08ba91e6fb03..2514779f5131 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
>         return true;
>  }
>
> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> +{
> +       return IS_ENABLED(CONFIG_KVM_GMEM);
> +}

This is written as if it is okay for CONFIG_KVM_GMEM not to be
enabled, but when disabling CONFIG_KVM_GMEM you will get an error for
redefining kvm_arch_supports_gmem().

I think you either want to include:

#define kvm_arch_supports_gmem kvm_arch_supports_gmem

or just do something closer to what x86 does:

#ifdef CONFIG_KVM_GMEM
#define kvm_arch_supports_gmem(kvm) true
#endif

> +
> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> +{
> +       return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
> +}

And this applies here as well.

#define kvm_arch_vm_supports_gmem_shared_mem
kvm_arch_vm_supports_gmem_shared_mem

or

#ifdef CONFIG_KVM_GMEM
#define kvm_arch_vm_supports_gmem_shared_mem(kvm)
IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
#endif

> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 096e45acadb2..8c1e1964b46a 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -38,6 +38,7 @@ menuconfig KVM
>         select HAVE_KVM_VCPU_RUN_PID_CHANGE
>         select SCHED_INFO
>         select GUEST_PERF_EVENTS if PERF_EVENTS
> +       select KVM_GMEM_SHARED_MEM

This makes it impossible to see the error, but I think we should fix
it anyway. :)

>         help
>           Support hosting virtualized guest machines.
>
> --
> 2.49.0.1045.g170613ef41-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-13 16:34 ` [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
                     ` (4 preceding siblings ...)
  2025-05-15 23:42   ` Gavin Shan
@ 2025-05-16  6:08   ` Gavin Shan
  2025-05-16  7:56     ` Fuad Tabba
  2025-05-21  7:41   ` David Hildenbrand
  6 siblings, 1 reply; 88+ messages in thread
From: Gavin Shan @ 2025-05-16  6:08 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

Hi Fuad,

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> This patch enables support for shared memory in guest_memfd, including
> mapping that memory at the host userspace. This support is gated by the
> configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> guest_memfd instance.
> 
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h | 10 ++++
>   include/linux/kvm_host.h        | 13 +++++
>   include/uapi/linux/kvm.h        |  1 +
>   virt/kvm/Kconfig                |  5 ++
>   virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
>   5 files changed, 117 insertions(+)
> 

[...]

> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 6db515833f61..8e6d1866b55e 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>   	return gfn - slot->base_gfn + slot->gmem.pgoff;
>   }
>   
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +
> +static bool kvm_gmem_supports_shared(struct inode *inode)
> +{
> +	uint64_t flags = (uint64_t)inode->i_private;
> +
> +	return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +}
> +
> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> +{
> +	struct inode *inode = file_inode(vmf->vma->vm_file);
> +	struct folio *folio;
> +	vm_fault_t ret = VM_FAULT_LOCKED;
> +
> +	filemap_invalidate_lock_shared(inode->i_mapping);
> +
> +	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> +	if (IS_ERR(folio)) {
> +		int err = PTR_ERR(folio);
> +
> +		if (err == -EAGAIN)
> +			ret = VM_FAULT_RETRY;
> +		else
> +			ret = vmf_error(err);
> +
> +		goto out_filemap;
> +	}
> +
> +	if (folio_test_hwpoison(folio)) {
> +		ret = VM_FAULT_HWPOISON;
> +		goto out_folio;
> +	}
> +
> +	if (WARN_ON_ONCE(folio_test_large(folio))) {
> +		ret = VM_FAULT_SIGBUS;
> +		goto out_folio;
> +	}
> +

I don't think there is a large folio involved since the max/min folio order
(stored in struct address_space::flags) should have been set to 0, meaning
only order-0 is possible when the folio (page) is allocated and added to the
page-cache. More details can be referred to AS_FOLIO_ORDER_MASK. It's unnecessary
check but not harmful. Maybe a comment is needed to mention large folio isn't
around yet, but double confirm.


> +	if (!folio_test_uptodate(folio)) {
> +		clear_highpage(folio_page(folio, 0));
> +		kvm_gmem_mark_prepared(folio);
> +	}
> +

I must be missing some thing here. This chunk of code is out of sync to kvm_gmem_get_pfn(),
where kvm_gmem_prepare_folio() and kvm_arch_gmem_prepare() are executed, and then
PG_uptodate is set after that. In the latest ARM CCA series, kvm_arch_gmem_prepare()
isn't used, but it would delegate the folio (page) with the prerequisite that
the folio belongs to the private address space.

I guess that kvm_arch_gmem_prepare() is skipped here because we have the assumption that
the folio belongs to the shared address space? However, this assumption isn't always
true. We probably need to ensure the folio range is really belonging to the shared
address space by poking kvm->mem_attr_array, which can be modified by VMM through
ioctl KVM_SET_MEMORY_ATTRIBUTES.

> +	vmf->page = folio_file_page(folio, vmf->pgoff);
> +
> +out_folio:
> +	if (ret != VM_FAULT_LOCKED) {
> +		folio_unlock(folio);
> +		folio_put(folio);
> +	}
> +
> +out_filemap:
> +	filemap_invalidate_unlock_shared(inode->i_mapping);
> +
> +	return ret;
> +}
> +

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-15 23:50   ` James Houghton
@ 2025-05-16  7:07     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-16  7:07 UTC (permalink / raw)
  To: James Houghton
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

Hi James,

On Fri, 16 May 2025 at 01:51, James Houghton <jthoughton@google.com> wrote:
>
> On Tue, May 13, 2025 at 9:35 AM Fuad Tabba <tabba@google.com> wrote:
> >
> > Enable mapping guest_memfd in arm64. For now, it applies to all
> > VMs in arm64 that use guest_memfd. In the future, new VM types
> > can restrict this via kvm_arch_gmem_supports_shared_mem().
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
> >  arch/arm64/kvm/Kconfig            |  1 +
> >  2 files changed, 11 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 08ba91e6fb03..2514779f5131 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
> >         return true;
> >  }
> >
> > +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> > +{
> > +       return IS_ENABLED(CONFIG_KVM_GMEM);
> > +}
>
> This is written as if it is okay for CONFIG_KVM_GMEM not to be
> enabled, but when disabling CONFIG_KVM_GMEM you will get an error for
> redefining kvm_arch_supports_gmem().
>
> I think you either want to include:
>
> #define kvm_arch_supports_gmem kvm_arch_supports_gmem
>
> or just do something closer to what x86 does:
>
> #ifdef CONFIG_KVM_GMEM
> #define kvm_arch_supports_gmem(kvm) true
> #endif
> > +
> > +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> > +{
> > +       return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
> > +}
>
> And this applies here as well.
>
> #define kvm_arch_vm_supports_gmem_shared_mem
> kvm_arch_vm_supports_gmem_shared_mem
>
> or
>
> #ifdef CONFIG_KVM_GMEM
> #define kvm_arch_vm_supports_gmem_shared_mem(kvm)
> IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
> #endif
>
> > +
> >  #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 096e45acadb2..8c1e1964b46a 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -38,6 +38,7 @@ menuconfig KVM
> >         select HAVE_KVM_VCPU_RUN_PID_CHANGE
> >         select SCHED_INFO
> >         select GUEST_PERF_EVENTS if PERF_EVENTS
> > +       select KVM_GMEM_SHARED_MEM
>
> This makes it impossible to see the error, but I think we should fix
> it anyway. :)

Ack.

Thank you!
/fuad

> >         help
> >           Support hosting virtualized guest machines.
> >
> > --
> > 2.49.0.1045.g170613ef41-goog
> >


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-15 23:42   ` Gavin Shan
@ 2025-05-16  7:31     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-16  7:31 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi Gavin,

On Fri, 16 May 2025 at 01:42, Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Fuad,
>
> On 5/14/25 2:34 AM, Fuad Tabba wrote:
> > This patch enables support for shared memory in guest_memfd, including
> > mapping that memory at the host userspace. This support is gated by the
> > configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> > flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> > guest_memfd instance.
> >
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   arch/x86/include/asm/kvm_host.h | 10 ++++
> >   include/linux/kvm_host.h        | 13 +++++
> >   include/uapi/linux/kvm.h        |  1 +
> >   virt/kvm/Kconfig                |  5 ++
> >   virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
> >   5 files changed, 117 insertions(+)
> >
>
> [...]
>
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index b6ae8ad8934b..9857022a0f0c 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
> >   #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
> >
> >   #define KVM_CREATE_GUEST_MEMFD      _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> > +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED      (1UL << 0)
> >
>
> This would be (1ULL << 0) to be consistent with '__u64 struct kvm_create_guest_memfd::flags'

Ack.

Thanks!
/fuad

> Thanks,
> Gavin
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-16  6:08   ` Gavin Shan
@ 2025-05-16  7:56     ` Fuad Tabba
  2025-05-16 11:12       ` Gavin Shan
  0 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-16  7:56 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi Gavin,

On Fri, 16 May 2025 at 08:09, Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Fuad,
>
> On 5/14/25 2:34 AM, Fuad Tabba wrote:
> > This patch enables support for shared memory in guest_memfd, including
> > mapping that memory at the host userspace. This support is gated by the
> > configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> > flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> > guest_memfd instance.
> >
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   arch/x86/include/asm/kvm_host.h | 10 ++++
> >   include/linux/kvm_host.h        | 13 +++++
> >   include/uapi/linux/kvm.h        |  1 +
> >   virt/kvm/Kconfig                |  5 ++
> >   virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
> >   5 files changed, 117 insertions(+)
> >
>
> [...]
>
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 6db515833f61..8e6d1866b55e 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> >       return gfn - slot->base_gfn + slot->gmem.pgoff;
> >   }
> >
> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > +
> > +static bool kvm_gmem_supports_shared(struct inode *inode)
> > +{
> > +     uint64_t flags = (uint64_t)inode->i_private;
> > +
> > +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +}
> > +
> > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> > +{
> > +     struct inode *inode = file_inode(vmf->vma->vm_file);
> > +     struct folio *folio;
> > +     vm_fault_t ret = VM_FAULT_LOCKED;
> > +
> > +     filemap_invalidate_lock_shared(inode->i_mapping);
> > +
> > +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> > +     if (IS_ERR(folio)) {
> > +             int err = PTR_ERR(folio);
> > +
> > +             if (err == -EAGAIN)
> > +                     ret = VM_FAULT_RETRY;
> > +             else
> > +                     ret = vmf_error(err);
> > +
> > +             goto out_filemap;
> > +     }
> > +
> > +     if (folio_test_hwpoison(folio)) {
> > +             ret = VM_FAULT_HWPOISON;
> > +             goto out_folio;
> > +     }
> > +
> > +     if (WARN_ON_ONCE(folio_test_large(folio))) {
> > +             ret = VM_FAULT_SIGBUS;
> > +             goto out_folio;
> > +     }
> > +
>
> I don't think there is a large folio involved since the max/min folio order
> (stored in struct address_space::flags) should have been set to 0, meaning
> only order-0 is possible when the folio (page) is allocated and added to the
> page-cache. More details can be referred to AS_FOLIO_ORDER_MASK. It's unnecessary
> check but not harmful. Maybe a comment is needed to mention large folio isn't
> around yet, but double confirm.

The idea is to document the lack of hugepage support in code, but if
you think it's necessary, I could add a comment.


>
> > +     if (!folio_test_uptodate(folio)) {
> > +             clear_highpage(folio_page(folio, 0));
> > +             kvm_gmem_mark_prepared(folio);
> > +     }
> > +
>
> I must be missing some thing here. This chunk of code is out of sync to kvm_gmem_get_pfn(),
> where kvm_gmem_prepare_folio() and kvm_arch_gmem_prepare() are executed, and then
> PG_uptodate is set after that. In the latest ARM CCA series, kvm_arch_gmem_prepare()
> isn't used, but it would delegate the folio (page) with the prerequisite that
> the folio belongs to the private address space.
>
> I guess that kvm_arch_gmem_prepare() is skipped here because we have the assumption that
> the folio belongs to the shared address space? However, this assumption isn't always
> true. We probably need to ensure the folio range is really belonging to the shared
> address space by poking kvm->mem_attr_array, which can be modified by VMM through
> ioctl KVM_SET_MEMORY_ATTRIBUTES.

This series only supports shared memory, and the idea is not to use
the attributes to check. We ensure that only certain VM types can set
the flag (e.g., VM_TYPE_DEFAULT and KVM_X86_SW_PROTECTED_VM).

In the patch series that builds on it, with in-place conversion
between private and shared, we do add a check that the memory faulted
in is in-fact shared.

Thanks,
/fuad

> > +     vmf->page = folio_file_page(folio, vmf->pgoff);
> > +
> > +out_folio:
> > +     if (ret != VM_FAULT_LOCKED) {
> > +             folio_unlock(folio);
> > +             folio_put(folio);
> > +     }
> > +
> > +out_filemap:
> > +     filemap_invalidate_unlock_shared(inode->i_mapping);
> > +
> > +     return ret;
> > +}
> > +
>
> Thanks,
> Gavin
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-16  7:56     ` Fuad Tabba
@ 2025-05-16 11:12       ` Gavin Shan
  2025-05-16 14:20         ` Fuad Tabba
  0 siblings, 1 reply; 88+ messages in thread
From: Gavin Shan @ 2025-05-16 11:12 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi Fuad,

On 5/16/25 5:56 PM, Fuad Tabba wrote:
> On Fri, 16 May 2025 at 08:09, Gavin Shan <gshan@redhat.com> wrote:
>> On 5/14/25 2:34 AM, Fuad Tabba wrote:
>>> This patch enables support for shared memory in guest_memfd, including
>>> mapping that memory at the host userspace. This support is gated by the
>>> configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
>>> flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
>>> guest_memfd instance.
>>>
>>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>>    arch/x86/include/asm/kvm_host.h | 10 ++++
>>>    include/linux/kvm_host.h        | 13 +++++
>>>    include/uapi/linux/kvm.h        |  1 +
>>>    virt/kvm/Kconfig                |  5 ++
>>>    virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
>>>    5 files changed, 117 insertions(+)
>>>
>>
>> [...]
>>
>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>>> index 6db515833f61..8e6d1866b55e 100644
>>> --- a/virt/kvm/guest_memfd.c
>>> +++ b/virt/kvm/guest_memfd.c
>>> @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>>>        return gfn - slot->base_gfn + slot->gmem.pgoff;
>>>    }
>>>
>>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>> +
>>> +static bool kvm_gmem_supports_shared(struct inode *inode)
>>> +{
>>> +     uint64_t flags = (uint64_t)inode->i_private;
>>> +
>>> +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
>>> +}
>>> +
>>> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
>>> +{
>>> +     struct inode *inode = file_inode(vmf->vma->vm_file);
>>> +     struct folio *folio;
>>> +     vm_fault_t ret = VM_FAULT_LOCKED;
>>> +
>>> +     filemap_invalidate_lock_shared(inode->i_mapping);
>>> +
>>> +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>>> +     if (IS_ERR(folio)) {
>>> +             int err = PTR_ERR(folio);
>>> +
>>> +             if (err == -EAGAIN)
>>> +                     ret = VM_FAULT_RETRY;
>>> +             else
>>> +                     ret = vmf_error(err);
>>> +
>>> +             goto out_filemap;
>>> +     }
>>> +
>>> +     if (folio_test_hwpoison(folio)) {
>>> +             ret = VM_FAULT_HWPOISON;
>>> +             goto out_folio;
>>> +     }
>>> +
>>> +     if (WARN_ON_ONCE(folio_test_large(folio))) {
>>> +             ret = VM_FAULT_SIGBUS;
>>> +             goto out_folio;
>>> +     }
>>> +
>>
>> I don't think there is a large folio involved since the max/min folio order
>> (stored in struct address_space::flags) should have been set to 0, meaning
>> only order-0 is possible when the folio (page) is allocated and added to the
>> page-cache. More details can be referred to AS_FOLIO_ORDER_MASK. It's unnecessary
>> check but not harmful. Maybe a comment is needed to mention large folio isn't
>> around yet, but double confirm.
> 
> The idea is to document the lack of hugepage support in code, but if
> you think it's necessary, I could add a comment.
> 

Ok, I was actually nit-picky since we're at v9, which is close to integration,
I guess. If another respin is needed, a comment wouldn't be harmful, but it's
also perfectly fine without it :)

> 
>>
>>> +     if (!folio_test_uptodate(folio)) {
>>> +             clear_highpage(folio_page(folio, 0));
>>> +             kvm_gmem_mark_prepared(folio);
>>> +     }
>>> +
>>
>> I must be missing some thing here. This chunk of code is out of sync to kvm_gmem_get_pfn(),
>> where kvm_gmem_prepare_folio() and kvm_arch_gmem_prepare() are executed, and then
>> PG_uptodate is set after that. In the latest ARM CCA series, kvm_arch_gmem_prepare()
>> isn't used, but it would delegate the folio (page) with the prerequisite that
>> the folio belongs to the private address space.
>>
>> I guess that kvm_arch_gmem_prepare() is skipped here because we have the assumption that
>> the folio belongs to the shared address space? However, this assumption isn't always
>> true. We probably need to ensure the folio range is really belonging to the shared
>> address space by poking kvm->mem_attr_array, which can be modified by VMM through
>> ioctl KVM_SET_MEMORY_ATTRIBUTES.
> 
> This series only supports shared memory, and the idea is not to use
> the attributes to check. We ensure that only certain VM types can set
> the flag (e.g., VM_TYPE_DEFAULT and KVM_X86_SW_PROTECTED_VM).
> 
> In the patch series that builds on it, with in-place conversion
> between private and shared, we do add a check that the memory faulted
> in is in-fact shared.
> 

Ok, thanks for your clarification. I plan to review that series, but not
getting a chance yet. Right, it's sensible to limit the capability of modifying
page's attribute (private vs shared) to the particular machine types since
the whole feature (restricted mmap and in-place conversion) is applicable
to particular machine types. I can understand KVM_X86_SW_PROTECTED_VM
(similar to pKVM) needs the feature, but I don't understand why VM_TYPE_DEFAULT
needs the feature. I guess we may want to use guest-memfd as to tmpfs or
shmem, meaning all the address space associated with a guest-memfd is shared,
but without the corresponding private space pointed by struct kvm_userspace_memory_region2
::userspace_addr. Instead, the 'userspace_addr' will be mmap(guest-memfd) from
VMM's perspective if I'm correct.

Thanks,
Gavin

> Thanks,
> /fuad
> 
>>> +     vmf->page = folio_file_page(folio, vmf->pgoff);
>>> +
>>> +out_folio:
>>> +     if (ret != VM_FAULT_LOCKED) {
>>> +             folio_unlock(folio);
>>> +             folio_put(folio);
>>> +     }
>>> +
>>> +out_filemap:
>>> +     filemap_invalidate_unlock_shared(inode->i_mapping);
>>> +
>>> +     return ret;
>>> +}
>>> +
>>
>> Thanks,
>> Gavin
>>
> 



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-16 11:12       ` Gavin Shan
@ 2025-05-16 14:20         ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-16 14:20 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi Gavin,

On Fri, 16 May 2025 at 13:12, Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Fuad,
>
> On 5/16/25 5:56 PM, Fuad Tabba wrote:
> > On Fri, 16 May 2025 at 08:09, Gavin Shan <gshan@redhat.com> wrote:
> >> On 5/14/25 2:34 AM, Fuad Tabba wrote:
> >>> This patch enables support for shared memory in guest_memfd, including
> >>> mapping that memory at the host userspace. This support is gated by the
> >>> configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd
> >>> flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a
> >>> guest_memfd instance.
> >>>
> >>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> >>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> >>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>> ---
> >>>    arch/x86/include/asm/kvm_host.h | 10 ++++
> >>>    include/linux/kvm_host.h        | 13 +++++
> >>>    include/uapi/linux/kvm.h        |  1 +
> >>>    virt/kvm/Kconfig                |  5 ++
> >>>    virt/kvm/guest_memfd.c          | 88 +++++++++++++++++++++++++++++++++
> >>>    5 files changed, 117 insertions(+)
> >>>
> >>
> >> [...]
> >>
> >>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> >>> index 6db515833f61..8e6d1866b55e 100644
> >>> --- a/virt/kvm/guest_memfd.c
> >>> +++ b/virt/kvm/guest_memfd.c
> >>> @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> >>>        return gfn - slot->base_gfn + slot->gmem.pgoff;
> >>>    }
> >>>
> >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >>> +
> >>> +static bool kvm_gmem_supports_shared(struct inode *inode)
> >>> +{
> >>> +     uint64_t flags = (uint64_t)inode->i_private;
> >>> +
> >>> +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> >>> +}
> >>> +
> >>> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> >>> +{
> >>> +     struct inode *inode = file_inode(vmf->vma->vm_file);
> >>> +     struct folio *folio;
> >>> +     vm_fault_t ret = VM_FAULT_LOCKED;
> >>> +
> >>> +     filemap_invalidate_lock_shared(inode->i_mapping);
> >>> +
> >>> +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> >>> +     if (IS_ERR(folio)) {
> >>> +             int err = PTR_ERR(folio);
> >>> +
> >>> +             if (err == -EAGAIN)
> >>> +                     ret = VM_FAULT_RETRY;
> >>> +             else
> >>> +                     ret = vmf_error(err);
> >>> +
> >>> +             goto out_filemap;
> >>> +     }
> >>> +
> >>> +     if (folio_test_hwpoison(folio)) {
> >>> +             ret = VM_FAULT_HWPOISON;
> >>> +             goto out_folio;
> >>> +     }
> >>> +
> >>> +     if (WARN_ON_ONCE(folio_test_large(folio))) {
> >>> +             ret = VM_FAULT_SIGBUS;
> >>> +             goto out_folio;
> >>> +     }
> >>> +
> >>
> >> I don't think there is a large folio involved since the max/min folio order
> >> (stored in struct address_space::flags) should have been set to 0, meaning
> >> only order-0 is possible when the folio (page) is allocated and added to the
> >> page-cache. More details can be referred to AS_FOLIO_ORDER_MASK. It's unnecessary
> >> check but not harmful. Maybe a comment is needed to mention large folio isn't
> >> around yet, but double confirm.
> >
> > The idea is to document the lack of hugepage support in code, but if
> > you think it's necessary, I could add a comment.
> >
>
> Ok, I was actually nit-picky since we're at v9, which is close to integration,
> I guess. If another respin is needed, a comment wouldn't be harmful, but it's
> also perfectly fine without it :)
>
> >
> >>
> >>> +     if (!folio_test_uptodate(folio)) {
> >>> +             clear_highpage(folio_page(folio, 0));
> >>> +             kvm_gmem_mark_prepared(folio);
> >>> +     }
> >>> +
> >>
> >> I must be missing some thing here. This chunk of code is out of sync to kvm_gmem_get_pfn(),
> >> where kvm_gmem_prepare_folio() and kvm_arch_gmem_prepare() are executed, and then
> >> PG_uptodate is set after that. In the latest ARM CCA series, kvm_arch_gmem_prepare()
> >> isn't used, but it would delegate the folio (page) with the prerequisite that
> >> the folio belongs to the private address space.
> >>
> >> I guess that kvm_arch_gmem_prepare() is skipped here because we have the assumption that
> >> the folio belongs to the shared address space? However, this assumption isn't always
> >> true. We probably need to ensure the folio range is really belonging to the shared
> >> address space by poking kvm->mem_attr_array, which can be modified by VMM through
> >> ioctl KVM_SET_MEMORY_ATTRIBUTES.
> >
> > This series only supports shared memory, and the idea is not to use
> > the attributes to check. We ensure that only certain VM types can set
> > the flag (e.g., VM_TYPE_DEFAULT and KVM_X86_SW_PROTECTED_VM).
> >
> > In the patch series that builds on it, with in-place conversion
> > between private and shared, we do add a check that the memory faulted
> > in is in-fact shared.
> >
>
> Ok, thanks for your clarification. I plan to review that series, but not
> getting a chance yet. Right, it's sensible to limit the capability of modifying
> page's attribute (private vs shared) to the particular machine types since
> the whole feature (restricted mmap and in-place conversion) is applicable
> to particular machine types. I can understand KVM_X86_SW_PROTECTED_VM
> (similar to pKVM) needs the feature, but I don't understand why VM_TYPE_DEFAULT
> needs the feature. I guess we may want to use guest-memfd as to tmpfs or
> shmem, meaning all the address space associated with a guest-memfd is shared,
> but without the corresponding private space pointed by struct kvm_userspace_memory_region2
> ::userspace_addr. Instead, the 'userspace_addr' will be mmap(guest-memfd) from
> VMM's perspective if I'm correct.

There are two reasons for why we're adding this feature for
VM_TYPE_DEFAULT. The first is for VMMs like Firecracker to be able to
run guests backed completely by guest_memfd [1]. Combined with
Patrick's series for direct map removal in guest_memfd [2], this would
allow running VMs that offer additional hardening against Spectre-like
transient execution attacks. The other one is that, in the long term,
the hope is for guest_memfd to become the main way for backing guests,
regardless of the type of guest they represent.

If you're interested to find out more, we had a discussion about this
a couple of weeks ago during the bi-weekly guest_memfd upstream call
(May 1) [3].

Cheers,
/fuad

[1] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[2] https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
[3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAosPOk/edit?tab=t.0#heading=h.jwwteecellpo





> Thanks,
> Gavin
>
> > Thanks,
> > /fuad
> >
> >>> +     vmf->page = folio_file_page(folio, vmf->pgoff);
> >>> +
> >>> +out_folio:
> >>> +     if (ret != VM_FAULT_LOCKED) {
> >>> +             folio_unlock(folio);
> >>> +             folio_put(folio);
> >>> +     }
> >>> +
> >>> +out_filemap:
> >>> +     filemap_invalidate_unlock_shared(inode->i_mapping);
> >>> +
> >>> +     return ret;
> >>> +}
> >>> +
> >>
> >> Thanks,
> >> Gavin
> >>
> >
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-13 18:37   ` Ackerley Tng
@ 2025-05-16 19:21     ` James Houghton
  2025-05-18 15:17       ` Fuad Tabba
  0 siblings, 1 reply; 88+ messages in thread
From: James Houghton @ 2025-05-16 19:21 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, seanjc, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

On Tue, May 13, 2025 at 11:37 AM Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 6db515833f61..8e6d1866b55e 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> >       return gfn - slot->base_gfn + slot->gmem.pgoff;
> >  }
> >
> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > +
> > +static bool kvm_gmem_supports_shared(struct inode *inode)
> > +{
> > +     uint64_t flags = (uint64_t)inode->i_private;
> > +
> > +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +}
> > +
> > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> > +{
> > +     struct inode *inode = file_inode(vmf->vma->vm_file);
> > +     struct folio *folio;
> > +     vm_fault_t ret = VM_FAULT_LOCKED;
> > +
> > +     filemap_invalidate_lock_shared(inode->i_mapping);
> > +
> > +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> > +     if (IS_ERR(folio)) {
> > +             int err = PTR_ERR(folio);
> > +
> > +             if (err == -EAGAIN)
> > +                     ret = VM_FAULT_RETRY;
> > +             else
> > +                     ret = vmf_error(err);
> > +
> > +             goto out_filemap;
> > +     }
> > +
> > +     if (folio_test_hwpoison(folio)) {
> > +             ret = VM_FAULT_HWPOISON;
> > +             goto out_folio;
> > +     }

nit: shmem_fault() does not include an equivalent of the above
HWPOISON check, and __do_fault() already handles HWPOISON.

It's very unlikely for `folio` to be hwpoison and not up-to-date, and
even then, writing over poison (to zero the folio) is not usually
fatal.

> > +
> > +     if (WARN_ON_ONCE(folio_test_large(folio))) {
> > +             ret = VM_FAULT_SIGBUS;
> > +             goto out_folio;
> > +     }

nit: I would prefer we remove this SIGBUS bit and change the below
clearing logic to handle large folios. Up to you I suppose.

> > +
> > +     if (!folio_test_uptodate(folio)) {
> > +             clear_highpage(folio_page(folio, 0));
> > +             kvm_gmem_mark_prepared(folio);
> > +     }
> > +
> > +     vmf->page = folio_file_page(folio, vmf->pgoff);
> > +
> > +out_folio:
> > +     if (ret != VM_FAULT_LOCKED) {
> > +             folio_unlock(folio);
> > +             folio_put(folio);
> > +     }
> > +
> > +out_filemap:
> > +     filemap_invalidate_unlock_shared(inode->i_mapping);
>
> Do we need to hold the filemap_invalidate_lock while zeroing? Would
> holding the folio lock be enough?

Do we need to hold the filemap_invalidate_lock for reading *at all*?

I don't see why we need it. We're not checking gmem->bindings, and
filemap_grab_folio() already synchronizes with filemap removal
properly.

>
> > +
> > +     return ret;
> > +}


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-16 19:21     ` James Houghton
@ 2025-05-18 15:17       ` Fuad Tabba
  2025-05-21  7:36         ` David Hildenbrand
  0 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-18 15:17 UTC (permalink / raw)
  To: James Houghton
  Cc: Ackerley Tng, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, seanjc, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

Hi James,

On Fri, 16 May 2025 at 21:22, James Houghton <jthoughton@google.com> wrote:
>
> On Tue, May 13, 2025 at 11:37 AM Ackerley Tng <ackerleytng@google.com> wrote:
> >
> > Fuad Tabba <tabba@google.com> writes:
> > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > > index 6db515833f61..8e6d1866b55e 100644
> > > --- a/virt/kvm/guest_memfd.c
> > > +++ b/virt/kvm/guest_memfd.c
> > > @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> > >       return gfn - slot->base_gfn + slot->gmem.pgoff;
> > >  }
> > >
> > > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > > +
> > > +static bool kvm_gmem_supports_shared(struct inode *inode)
> > > +{
> > > +     uint64_t flags = (uint64_t)inode->i_private;
> > > +
> > > +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > > +}
> > > +
> > > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> > > +{
> > > +     struct inode *inode = file_inode(vmf->vma->vm_file);
> > > +     struct folio *folio;
> > > +     vm_fault_t ret = VM_FAULT_LOCKED;
> > > +
> > > +     filemap_invalidate_lock_shared(inode->i_mapping);
> > > +
> > > +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> > > +     if (IS_ERR(folio)) {
> > > +             int err = PTR_ERR(folio);
> > > +
> > > +             if (err == -EAGAIN)
> > > +                     ret = VM_FAULT_RETRY;
> > > +             else
> > > +                     ret = vmf_error(err);
> > > +
> > > +             goto out_filemap;
> > > +     }
> > > +
> > > +     if (folio_test_hwpoison(folio)) {
> > > +             ret = VM_FAULT_HWPOISON;
> > > +             goto out_folio;
> > > +     }
>
> nit: shmem_fault() does not include an equivalent of the above
> HWPOISON check, and __do_fault() already handles HWPOISON.
>
> It's very unlikely for `folio` to be hwpoison and not up-to-date, and
> even then, writing over poison (to zero the folio) is not usually
> fatal.

No strong preference, but the fact the it's still possible (even if
unlikely) makes me lean towards keeping it.

> > > +
> > > +     if (WARN_ON_ONCE(folio_test_large(folio))) {
> > > +             ret = VM_FAULT_SIGBUS;
> > > +             goto out_folio;
> > > +     }
>
> nit: I would prefer we remove this SIGBUS bit and change the below
> clearing logic to handle large folios. Up to you I suppose.

No strong preference here either. This is meant as a way to point out
the lack of hugepage support, based on suggestions from a previous
spin of this series.

> > > +
> > > +     if (!folio_test_uptodate(folio)) {
> > > +             clear_highpage(folio_page(folio, 0));
> > > +             kvm_gmem_mark_prepared(folio);
> > > +     }
> > > +
> > > +     vmf->page = folio_file_page(folio, vmf->pgoff);
> > > +
> > > +out_folio:
> > > +     if (ret != VM_FAULT_LOCKED) {
> > > +             folio_unlock(folio);
> > > +             folio_put(folio);
> > > +     }
> > > +
> > > +out_filemap:
> > > +     filemap_invalidate_unlock_shared(inode->i_mapping);
> >
> > Do we need to hold the filemap_invalidate_lock while zeroing? Would
> > holding the folio lock be enough?
>
> Do we need to hold the filemap_invalidate_lock for reading *at all*?
>
> I don't see why we need it. We're not checking gmem->bindings, and
> filemap_grab_folio() already synchronizes with filemap removal
> properly.

Ack.

Thanks!
/fuad

> >
> > > +
> > > +     return ret;
> > > +}


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 12/17] KVM: arm64: Rename variables in user_mem_abort()
  2025-05-13 16:34 ` [PATCH v9 12/17] KVM: arm64: Rename variables in user_mem_abort() Fuad Tabba
@ 2025-05-21  2:25   ` Gavin Shan
  2025-05-21  9:57     ` Fuad Tabba
  2025-05-21  8:02   ` David Hildenbrand
  1 sibling, 1 reply; 88+ messages in thread
From: Gavin Shan @ 2025-05-21  2:25 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

Hi Fuad,

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> Guest memory can be backed by guest_memfd or by anonymous memory. Rename
> vma_shift to page_shift and vma_pagesize to page_size to ease
> readability in subsequent patches.
> 
> Suggested-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/arm64/kvm/mmu.c | 54 ++++++++++++++++++++++----------------------
>   1 file changed, 27 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 9865ada04a81..d756c2b5913f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1479,13 +1479,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	phys_addr_t ipa = fault_ipa;
>   	struct kvm *kvm = vcpu->kvm;
>   	struct vm_area_struct *vma;
> -	short vma_shift;
> +	short page_shift;
>   	void *memcache;
>   	gfn_t gfn;
>   	kvm_pfn_t pfn;
>   	bool logging_active = memslot_is_logging(memslot);
>   	bool force_pte = logging_active || is_protected_kvm_enabled();
> -	long vma_pagesize, fault_granule;
> +	long page_size, fault_granule;
>   	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>   	struct kvm_pgtable *pgt;
>   	struct page *page;

[...]

>   
>   	/*
> @@ -1600,9 +1600,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	 * ensure we find the right PFN and lay down the mapping in the right
>   	 * place.
>   	 */
> -	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
> -		fault_ipa &= ~(vma_pagesize - 1);
> -		ipa &= ~(vma_pagesize - 1);
> +	if (page_size == PMD_SIZE || page_size == PUD_SIZE) {
> +		fault_ipa &= ~(page_size - 1);
> +		ipa &= ~(page_size - 1);
>   	}
>   

nit: since we're here for readability, ALIGN_DOWN() may be used:

		fault_ipa = ALIGN_DOWN(fault_ipa, page_size);
		ipa = ALIGN_DOWN(ipa, page_size);

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 15/17] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM
  2025-05-13 16:34 ` [PATCH v9 15/17] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
@ 2025-05-21  2:46   ` Gavin Shan
  2025-05-21  8:24     ` Fuad Tabba
  2025-05-21  8:06   ` David Hildenbrand
  1 sibling, 1 reply; 88+ messages in thread
From: Gavin Shan @ 2025-05-21  2:46 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

Hi Fuad,

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> This patch introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> indicates that guest_memfd supports shared memory (when enabled by the
> flag). This support is limited to certain VM types, determined per
> architecture.
> 
> This patch also updates the KVM documentation with details on the new
> capability, flag, and other information about support for shared memory
> in guest_memfd.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   Documentation/virt/kvm/api.rst | 18 ++++++++++++++++++
>   include/uapi/linux/kvm.h       |  1 +
>   virt/kvm/kvm_main.c            |  4 ++++
>   3 files changed, 23 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 47c7c3f92314..86f74ce7f12a 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6390,6 +6390,24 @@ most one mapping per page, i.e. binding multiple memory regions to a single
>   guest_memfd range is not allowed (any number of memory regions can be bound to
>   a single guest_memfd file, but the bound ranges must not overlap).
>   
> +When the capability KVM_CAP_GMEM_SHARED_MEM is supported, the 'flags' field
> +supports GUEST_MEMFD_FLAG_SUPPORT_SHARED.  Setting this flag on guest_memfd
> +creation enables mmap() and faulting of guest_memfd memory to host userspace.
> +
> +When the KVM MMU performs a PFN lookup to service a guest fault and the backing
> +guest_memfd has the GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then the fault will
> +always be consumed from guest_memfd, regardless of whether it is a shared or a
> +private fault.
> +
> +For these memslots, userspace_addr is checked to be the mmap()-ed view of the
> +same range specified using gmem.pgoff.  Other accesses by KVM, e.g., instruction
> +emulation, go via slot->userspace_addr.  The slot->userspace_addr field can be
> +set to 0 to skip this check, which indicates that KVM would not access memory
> +belonging to the slot via its userspace_addr.
> +

This paragraph needs to be removed if PATCH[08/17] is going to be dropped.

[PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range

> +The use of GUEST_MEMFD_FLAG_SUPPORT_SHARED will not be allowed for CoCo VMs.
> +This is validated when the guest_memfd instance is bound to the VM.
> +
>   See KVM_SET_USER_MEMORY_REGION2 for additional details.
>   

[...]

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 16/17] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2025-05-13 16:34 ` [PATCH v9 16/17] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
@ 2025-05-21  6:53   ` Gavin Shan
  2025-05-21  9:38     ` Fuad Tabba
  0 siblings, 1 reply; 88+ messages in thread
From: Gavin Shan @ 2025-05-21  6:53 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

Hi Fuad,

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> Expand the guest_memfd selftests to include testing mapping guest
> memory for VM types that support it.
> 
> Also, build the guest_memfd selftest for arm64.
> 
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   tools/testing/selftests/kvm/Makefile.kvm      |   1 +
>   .../testing/selftests/kvm/guest_memfd_test.c  | 145 +++++++++++++++---
>   2 files changed, 126 insertions(+), 20 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> index f62b0a5aba35..ccf95ed037c3 100644
> --- a/tools/testing/selftests/kvm/Makefile.kvm
> +++ b/tools/testing/selftests/kvm/Makefile.kvm
> @@ -163,6 +163,7 @@ TEST_GEN_PROGS_arm64 += access_tracking_perf_test
>   TEST_GEN_PROGS_arm64 += arch_timer
>   TEST_GEN_PROGS_arm64 += coalesced_io_test
>   TEST_GEN_PROGS_arm64 += dirty_log_perf_test
> +TEST_GEN_PROGS_arm64 += guest_memfd_test
>   TEST_GEN_PROGS_arm64 += get-reg-list
>   TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
>   TEST_GEN_PROGS_arm64 += memslot_perf_test
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index ce687f8d248f..443c49185543 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -34,12 +34,46 @@ static void test_file_read_write(int fd)
>   		    "pwrite on a guest_mem fd should fail");
>   }
>   
> -static void test_mmap(int fd, size_t page_size)
> +static void test_mmap_allowed(int fd, size_t page_size, size_t total_size)
> +{
> +	const char val = 0xaa;
> +	char *mem;
> +	size_t i;
> +	int ret;
> +
> +	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +	TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass.");
> +
> +	memset(mem, val, total_size);
> +	for (i = 0; i < total_size; i++)
> +		TEST_ASSERT_EQ(mem[i], val);
> +
> +	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
> +			page_size);
> +	TEST_ASSERT(!ret, "fallocate the first page should succeed");
> +
> +	for (i = 0; i < page_size; i++)
> +		TEST_ASSERT_EQ(mem[i], 0x00);
> +	for (; i < total_size; i++)
> +		TEST_ASSERT_EQ(mem[i], val);
> +
> +	memset(mem, val, total_size);
> +	for (i = 0; i < total_size; i++)
> +		TEST_ASSERT_EQ(mem[i], val);
> +

The last memset() and check the resident values look redudant because same
test has been covered by the first memset(). If we really want to double
confirm that the page-cache is writabble, it would be enough to cover the
first page. Otherwise, I guess this hunk of code can be removed :)

	memset(mem, val, page_size);
	for (i = 0; i < page_size; i++)
		TEST_ASSERT_EQ(mem[i], val);

> +	ret = munmap(mem, total_size);
> +	TEST_ASSERT(!ret, "munmap should succeed");
> +}
> +
> +static void test_mmap_denied(int fd, size_t page_size, size_t total_size)
>   {
>   	char *mem;
>   
>   	mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>   	TEST_ASSERT_EQ(mem, MAP_FAILED);
> +
> +	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +	TEST_ASSERT_EQ(mem, MAP_FAILED);
>   }
>   
>   static void test_file_size(int fd, size_t page_size, size_t total_size)
> @@ -120,26 +154,19 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
>   	}
>   }
>   
> -static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
> +static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
> +						  uint64_t guest_memfd_flags,
> +						  size_t page_size)
>   {
> -	size_t page_size = getpagesize();
> -	uint64_t flag;
>   	size_t size;
>   	int fd;
>   
>   	for (size = 1; size < page_size; size++) {
> -		fd = __vm_create_guest_memfd(vm, size, 0);
> +		fd = __vm_create_guest_memfd(vm, size, guest_memfd_flags);
>   		TEST_ASSERT(fd == -1 && errno == EINVAL,
>   			    "guest_memfd() with non-page-aligned page size '0x%lx' should fail with EINVAL",
>   			    size);
>   	}
> -
> -	for (flag = BIT(0); flag; flag <<= 1) {
> -		fd = __vm_create_guest_memfd(vm, page_size, flag);
> -		TEST_ASSERT(fd == -1 && errno == EINVAL,
> -			    "guest_memfd() with flag '0x%lx' should fail with EINVAL",
> -			    flag);
> -	}
>   }
>   
>   static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> @@ -170,30 +197,108 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
>   	close(fd1);
>   }
>   
> -int main(int argc, char *argv[])
> +static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
> +			   bool expect_mmap_allowed)
>   {
> -	size_t page_size;
> +	struct kvm_vm *vm;
>   	size_t total_size;
> +	size_t page_size;
>   	int fd;
> -	struct kvm_vm *vm;
>   
> -	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> +	if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type)))
> +		return;
>   

The check seems incorrect for aarch64 since 0 is always returned from
kvm_check_cap() there. The test is skipped for VM_TYPE_DEFAULT on aarch64.
So it would be something like below:

	#define VM_TYPE_DEFAULT		0

	if (vm_type != VM_TYPE_DEFAULT &&
	    !(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type)))
		return;

>   	page_size = getpagesize();
>   	total_size = page_size * 4;
>   
> -	vm = vm_create_barebones();
> +	vm = vm_create_barebones_type(vm_type);
>   
> -	test_create_guest_memfd_invalid(vm);
>   	test_create_guest_memfd_multiple(vm);
> +	test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size);
>   
> -	fd = vm_create_guest_memfd(vm, total_size, 0);
> +	fd = vm_create_guest_memfd(vm, total_size, guest_memfd_flags);
>   
>   	test_file_read_write(fd);
> -	test_mmap(fd, page_size);
> +
> +	if (expect_mmap_allowed)
> +		test_mmap_allowed(fd, page_size, total_size);
> +	else
> +		test_mmap_denied(fd, page_size, total_size);
> +
>   	test_file_size(fd, page_size, total_size);
>   	test_fallocate(fd, page_size, total_size);
>   	test_invalid_punch_hole(fd, page_size, total_size);
>   
>   	close(fd);
> +	kvm_vm_release(vm);
> +}
> +
> +static void test_vm_type_gmem_flag_validity(unsigned long vm_type,
> +					    uint64_t expected_valid_flags)
> +{
> +	size_t page_size = getpagesize();
> +	struct kvm_vm *vm;
> +	uint64_t flag = 0;
> +	int fd;
> +
> +	if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type)))
> +		return;

Same as above

> +
> +	vm = vm_create_barebones_type(vm_type);
> +
> +	for (flag = BIT(0); flag; flag <<= 1) {
> +		fd = __vm_create_guest_memfd(vm, page_size, flag);
> +
> +		if (flag & expected_valid_flags) {
> +			TEST_ASSERT(fd > 0,
> +				    "guest_memfd() with flag '0x%lx' should be valid",
> +				    flag);
> +			close(fd);
> +		} else {
> +			TEST_ASSERT(fd == -1 && errno == EINVAL,
> +				    "guest_memfd() with flag '0x%lx' should fail with EINVAL",
> +				    flag);

It's more robust to have:

			TEST_ASSERT(fd < 0 && errno == EINVAL, ...);

> +		}
> +	}
> +
> +	kvm_vm_release(vm);
> +}
> +
> +static void test_gmem_flag_validity(void)
> +{
> +	uint64_t non_coco_vm_valid_flags = 0;
> +
> +	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
> +		non_coco_vm_valid_flags = GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +
> +	test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, non_coco_vm_valid_flags);
> +
> +#ifdef __x86_64__
> +	test_vm_type_gmem_flag_validity(KVM_X86_SW_PROTECTED_VM, non_coco_vm_valid_flags);
> +	test_vm_type_gmem_flag_validity(KVM_X86_SEV_VM, 0);
> +	test_vm_type_gmem_flag_validity(KVM_X86_SEV_ES_VM, 0);
> +	test_vm_type_gmem_flag_validity(KVM_X86_SNP_VM, 0);
> +	test_vm_type_gmem_flag_validity(KVM_X86_TDX_VM, 0);
> +#endif
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> +
> +	test_gmem_flag_validity();
> +
> +	test_with_type(VM_TYPE_DEFAULT, 0, false);
> +	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
> +		test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_SUPPORT_SHARED,
> +			       true);
> +	}
> +
> +#ifdef __x86_64__
> +	test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
> +	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
> +		test_with_type(KVM_X86_SW_PROTECTED_VM,
> +			       GUEST_MEMFD_FLAG_SUPPORT_SHARED, true);
> +	}
> +#endif
>   }

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 01/17] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  2025-05-13 16:34 ` [PATCH v9 01/17] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
@ 2025-05-21  7:14   ` Gavin Shan
  0 siblings, 0 replies; 88+ messages in thread
From: Gavin Shan @ 2025-05-21  7:14 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> The option KVM_PRIVATE_MEM enables guest_memfd in general. Subsequent
> patches add shared memory support to guest_memfd. Therefore, rename it
> to KVM_GMEM to make its purpose clearer.
> 
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  2 +-
>   include/linux/kvm_host.h        | 10 +++++-----
>   virt/kvm/Kconfig                |  8 ++++----
>   virt/kvm/Makefile.kvm           |  2 +-
>   virt/kvm/kvm_main.c             |  4 ++--
>   virt/kvm/kvm_mm.h               |  4 ++--
>   6 files changed, 15 insertions(+), 15 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 02/17] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-05-13 16:34 ` [PATCH v9 02/17] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
  2025-05-13 21:56   ` Ira Weiny
@ 2025-05-21  7:14   ` Gavin Shan
  1 sibling, 0 replies; 88+ messages in thread
From: Gavin Shan @ 2025-05-21  7:14 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> The option KVM_GENERIC_PRIVATE_MEM enables populating a GPA range with
> guest data. Rename it to KVM_GENERIC_GMEM_POPULATE to make its purpose
> clearer.
> 
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/x86/kvm/Kconfig     | 4 ++--
>   include/linux/kvm_host.h | 2 +-
>   virt/kvm/Kconfig         | 2 +-
>   virt/kvm/guest_memfd.c   | 2 +-
>   4 files changed, 5 insertions(+), 5 deletions(-)
> 
Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 03/17] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
  2025-05-13 16:34 ` [PATCH v9 03/17] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
@ 2025-05-21  7:15   ` Gavin Shan
  0 siblings, 0 replies; 88+ messages in thread
From: Gavin Shan @ 2025-05-21  7:15 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> The function kvm_arch_has_private_mem() is used to indicate whether
> guest_memfd is supported by the architecture, which until now implies
> that its private. To decouple guest_memfd support from whether the
> memory is private, rename this function to kvm_arch_supports_gmem().
> 
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h | 8 ++++----
>   arch/x86/kvm/mmu/mmu.c          | 8 ++++----
>   include/linux/kvm_host.h        | 6 +++---
>   virt/kvm/kvm_main.c             | 6 +++---
>   4 files changed, 14 insertions(+), 14 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 04/17] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
  2025-05-13 16:34 ` [PATCH v9 04/17] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
@ 2025-05-21  7:15   ` Gavin Shan
  0 siblings, 0 replies; 88+ messages in thread
From: Gavin Shan @ 2025-05-21  7:15 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> The bool has_private_mem is used to indicate whether guest_memfd is
> supported. Rename it to supports_gmem to make its meaning clearer and to
> decouple memory being private from guest_memfd.
> 
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h | 4 ++--
>   arch/x86/kvm/mmu/mmu.c          | 2 +-
>   arch/x86/kvm/svm/svm.c          | 4 ++--
>   arch/x86/kvm/x86.c              | 3 +--
>   4 files changed, 6 insertions(+), 7 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 05/17] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
  2025-05-13 16:34 ` [PATCH v9 05/17] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
@ 2025-05-21  7:16   ` Gavin Shan
  0 siblings, 0 replies; 88+ messages in thread
From: Gavin Shan @ 2025-05-21  7:16 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> The function kvm_slot_can_be_private() is used to check whether a memory
> slot is backed by guest_memfd. Rename it to kvm_slot_has_gmem() to make
> that clearer and to decouple memory being private from guest_memfd.
> 
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c   | 4 ++--
>   arch/x86/kvm/svm/sev.c   | 4 ++--
>   include/linux/kvm_host.h | 2 +-
>   virt/kvm/guest_memfd.c   | 2 +-
>   4 files changed, 6 insertions(+), 6 deletions(-)
> 
Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 06/17] KVM: Fix comments that refer to slots_lock
  2025-05-13 16:34 ` [PATCH v9 06/17] KVM: Fix comments that refer to slots_lock Fuad Tabba
@ 2025-05-21  7:16   ` Gavin Shan
  0 siblings, 0 replies; 88+ messages in thread
From: Gavin Shan @ 2025-05-21  7:16 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 5/14/25 2:34 AM, Fuad Tabba wrote:
> Fix comments so that they refer to slots_lock instead of slots_locks
> (remove trailing s).
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   include/linux/kvm_host.h | 2 +-
>   virt/kvm/kvm_main.c      | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-18 15:17       ` Fuad Tabba
@ 2025-05-21  7:36         ` David Hildenbrand
  0 siblings, 0 replies; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21  7:36 UTC (permalink / raw)
  To: Fuad Tabba, James Houghton
  Cc: Ackerley Tng, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, seanjc, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

On 18.05.25 17:17, Fuad Tabba wrote:
> Hi James,
> 
> On Fri, 16 May 2025 at 21:22, James Houghton <jthoughton@google.com> wrote:
>>
>> On Tue, May 13, 2025 at 11:37 AM Ackerley Tng <ackerleytng@google.com> wrote:
>>>
>>> Fuad Tabba <tabba@google.com> writes:
>>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>>>> index 6db515833f61..8e6d1866b55e 100644
>>>> --- a/virt/kvm/guest_memfd.c
>>>> +++ b/virt/kvm/guest_memfd.c
>>>> @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>>>>        return gfn - slot->base_gfn + slot->gmem.pgoff;
>>>>   }
>>>>
>>>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>>> +
>>>> +static bool kvm_gmem_supports_shared(struct inode *inode)
>>>> +{
>>>> +     uint64_t flags = (uint64_t)inode->i_private;
>>>> +
>>>> +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
>>>> +}
>>>> +
>>>> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
>>>> +{
>>>> +     struct inode *inode = file_inode(vmf->vma->vm_file);
>>>> +     struct folio *folio;
>>>> +     vm_fault_t ret = VM_FAULT_LOCKED;
>>>> +
>>>> +     filemap_invalidate_lock_shared(inode->i_mapping);
>>>> +
>>>> +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>>>> +     if (IS_ERR(folio)) {
>>>> +             int err = PTR_ERR(folio);
>>>> +
>>>> +             if (err == -EAGAIN)
>>>> +                     ret = VM_FAULT_RETRY;
>>>> +             else
>>>> +                     ret = vmf_error(err);
>>>> +
>>>> +             goto out_filemap;
>>>> +     }
>>>> +
>>>> +     if (folio_test_hwpoison(folio)) {
>>>> +             ret = VM_FAULT_HWPOISON;
>>>> +             goto out_folio;
>>>> +     }
>>
>> nit: shmem_fault() does not include an equivalent of the above
>> HWPOISON check, and __do_fault() already handles HWPOISON.
>>
>> It's very unlikely for `folio` to be hwpoison and not up-to-date, and
>> even then, writing over poison (to zero the folio) is not usually
>> fatal.
> 
> No strong preference, but the fact the it's still possible (even if
> unlikely) makes me lean towards keeping it.

__do_fault() indeed seems to handle it, so probably best to drop that 
for now.

>>>> +
>>>> +     if (WARN_ON_ONCE(folio_test_large(folio))) {
>>>> +             ret = VM_FAULT_SIGBUS;
>>>> +             goto out_folio;
>>>> +     }
>>
>> nit: I would prefer we remove this SIGBUS bit and change the below
>> clearing logic to handle large folios. Up to you I suppose.
> 
> No strong preference here either. This is meant as a way to point out
> the lack of hugepage support, based on suggestions from a previous
> spin of this series.

Yeah. With in-place conversion, we should never see large folios on this 
path. With shared-only VMs it will be different.

So for now, we can just leave it in and whoever stumbles over it can 
properly reason why it is OK for their use case to remove it.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-13 16:34 ` [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
                     ` (5 preceding siblings ...)
  2025-05-16  6:08   ` Gavin Shan
@ 2025-05-21  7:41   ` David Hildenbrand
  6 siblings, 0 replies; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21  7:41 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny


>   struct kvm_create_guest_memfd {
>   	__u64 size;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 559c93ad90be..f4e469a62a60 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
>   config HAVE_KVM_ARCH_GMEM_INVALIDATE
>          bool
>          depends on KVM_GMEM
> +
> +config KVM_GMEM_SHARED_MEM
> +       select KVM_GMEM
> +       bool
> +       prompt "Enables in-place shared memory for guest_memfd"

Not completely accurate :)

"Enable support for non-private ("shared") memory in guest_memfd" ?



-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-05-13 16:34 ` [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
@ 2025-05-21  7:48   ` David Hildenbrand
  2025-05-22  0:40     ` Ackerley Tng
  0 siblings, 1 reply; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21  7:48 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 13.05.25 18:34, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> For memslots backed by guest_memfd with shared mem support, the KVM MMU
> always faults-in pages from guest_memfd, and not from the userspace_addr.
> Towards this end, this patch also introduces a new guest_memfd flag,
> GUEST_MEMFD_FLAG_SUPPORT_SHARED, which indicates that the guest_memfd
> instance supports in-place shared memory.
> 
> This flag is only supported if the VM creating the guest_memfd instance
> belongs to certain types determined by architecture. Only non-CoCo VMs
> are permitted to use guest_memfd with shared mem, for now.
> 
> Function names have also been updated for accuracy -
> kvm_mem_is_private() returns true only when the current private/shared
> state (in the CoCo sense) of the memory is private, and returns false if
> the current state is shared explicitly or impicitly, e.g., belongs to a
> non-CoCo VM.
> 
> kvm_mmu_faultin_pfn_gmem() is updated to indicate that it can be used
> to fault in not just private memory, but more generally, from
> guest_memfd.
> 
> Co-developed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---


[...]

> +
>   #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
>   static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
>   {
> @@ -2515,10 +2524,30 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
>   bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>   					 struct kvm_gfn_range *range);
>   
> +/*
> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
> + * private.
> + *
> + * A return value of false indicates that the gfn is explicitly or implicity

s/implicity/implicitly/

> + * shared (i.e., non-CoCo VMs).
> + */
>   static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>   {
> -	return IS_ENABLED(CONFIG_KVM_GMEM) &&
> -	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +	struct kvm_memory_slot *slot;
> +
> +	if (!IS_ENABLED(CONFIG_KVM_GMEM))
> +		return false;
> +
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
> +		/*
> +		 * For now, memslots only support in-place shared memory if the
> +		 * host is allowed to mmap memory (i.e., non-Coco VMs).
> +		 */

Not accurate: there is no in-place conversion support in this series, 
because there is no such itnerface. So the reason is that all memory is 
shared for there VM types?

> +		return false;
> +	}
> +
> +	return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>   }
>   #else
>   static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 2f499021df66..fe0245335c96 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -388,6 +388,23 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>   
>   	return 0;
>   }
> +
> +bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot)
> +{
> +	struct file *file;
> +	bool ret;
> +
> +	file = kvm_gmem_get_file((struct kvm_memory_slot *)slot);
> +	if (!file)
> +		return false;
> +
> +	ret = kvm_gmem_supports_shared(file_inode(file));
> +
> +	fput(file);
> +	return ret;

Would it make sense to cache that information in the memslot, to avoid 
the get/put?

We could simply cache when creating the memslot I guess.

As an alternative ... could we simple get/put when managing the memslot?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-13 16:34 ` [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd Fuad Tabba
  2025-05-14  7:13   ` Shivank Garg
  2025-05-14 15:27   ` kernel test robot
@ 2025-05-21  8:01   ` David Hildenbrand
  2025-05-22  0:45     ` Ackerley Tng
  2025-05-22  7:22     ` Fuad Tabba
  2 siblings, 2 replies; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21  8:01 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 13.05.25 18:34, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> This patch adds kvm_gmem_max_mapping_level(), which always returns
> PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
> 
> When guest_memfd supports shared memory, max_mapping_level (especially
> when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
> from recover_huge_pages_range()) should take input from
> guest_memfd.
> 
> Input from guest_memfd should be taken in these cases:
> 
> + if the memslot supports shared memory (guest_memfd is used for
>    shared memory, or in future both shared and private memory) or
> + if the memslot is only used for private memory and that gfn is
>    private.
> 
> If the memslot doesn't use guest_memfd, figure out the
> max_mapping_level using the host page tables like before.
> 
> This patch also refactors and inlines the other call to
> __kvm_mmu_max_mapping_level().
> 
> In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
> provided (if applicable) in fault->max_level. Hence, there is no need
> to query guest_memfd.
> 
> lpage_info is queried like before, and then if the fault is not from
> guest_memfd, adjust fault->req_level based on input from host page
> tables.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c   | 92 ++++++++++++++++++++++++++--------------
>   include/linux/kvm_host.h |  7 +++
>   virt/kvm/guest_memfd.c   | 12 ++++++
>   3 files changed, 79 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index cfbb471f7c70..9e0bc8114859 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
>   	return level;
>   }
[...]

>   static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
>   					    struct kvm_page_fault *fault,
>   					    int order)
> @@ -4523,7 +4551,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>   {
>   	unsigned int foll = fault->write ? FOLL_WRITE : 0;
>   
> -	if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
> +	if (fault_from_gmem(fault))

Should this change rather have been done in the previous patch?

(then only adjust fault_from_gmem() in this function as required)

>   		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>   
>   	foll |= FOLL_NOWAIT;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index de7b46ee1762..f9bb025327c3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>   int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>   		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
>   		     int *max_order);
> +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
>   #else
>   static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>   				   struct kvm_memory_slot *slot, gfn_t gfn,
> @@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>   	KVM_BUG_ON(1, kvm);
>   	return -EIO;
>   }
> +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
> +					 gfn_t gfn)

Probably should indent with two tabs here.



-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 12/17] KVM: arm64: Rename variables in user_mem_abort()
  2025-05-13 16:34 ` [PATCH v9 12/17] KVM: arm64: Rename variables in user_mem_abort() Fuad Tabba
  2025-05-21  2:25   ` Gavin Shan
@ 2025-05-21  8:02   ` David Hildenbrand
  1 sibling, 0 replies; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21  8:02 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 13.05.25 18:34, Fuad Tabba wrote:
> Guest memory can be backed by guest_memfd or by anonymous memory. Rename
> vma_shift to page_shift and vma_pagesize to page_size to ease
> readability in subsequent patches.
> 
> Suggested-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 13/17] KVM: arm64: Handle guest_memfd()-backed guest page faults
  2025-05-13 16:34 ` [PATCH v9 13/17] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
  2025-05-14 21:26   ` James Houghton
@ 2025-05-21  8:04   ` David Hildenbrand
  2025-05-21 11:10     ` Fuad Tabba
  1 sibling, 1 reply; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21  8:04 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 13.05.25 18:34, Fuad Tabba wrote:
> Add arm64 support for handling guest page faults on guest_memfd
> backed memslots.
> 
> For now, the fault granule is restricted to PAGE_SIZE.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---

[...]

> +	if (!is_gmem) {

Should we add a comment somewhere, stating that we don't support VMs 
with private memory, so if we have a gmem, all faults are routed through 
that?

> +		mmap_read_lock(current->mm);
> +		vma = vma_lookup(current->mm, hva);
> +		if (unlikely(!vma)) {
> +			kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> +			mmap_read_unlock(current->mm);
> +			return -EFAULT;
> +		}
> +
> +		vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> +		mte_allowed = kvm_vma_mte_allowed(vma);

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-13 16:34 ` [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
  2025-05-15 23:50   ` James Houghton
@ 2025-05-21  8:05   ` David Hildenbrand
  2025-05-21 10:12     ` Fuad Tabba
  1 sibling, 1 reply; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21  8:05 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 13.05.25 18:34, Fuad Tabba wrote:
> Enable mapping guest_memfd in arm64. For now, it applies to all
> VMs in arm64 that use guest_memfd. In the future, new VM types
> can restrict this via kvm_arch_gmem_supports_shared_mem().
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
>   arch/arm64/kvm/Kconfig            |  1 +
>   2 files changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 08ba91e6fb03..2514779f5131 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
>   	return true;
>   }
>   
> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> +{
> +	return IS_ENABLED(CONFIG_KVM_GMEM);
> +}
> +
> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> +{
> +	return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
> +}
> +
>   #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 096e45acadb2..8c1e1964b46a 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -38,6 +38,7 @@ menuconfig KVM
>   	select HAVE_KVM_VCPU_RUN_PID_CHANGE
>   	select SCHED_INFO
>   	select GUEST_PERF_EVENTS if PERF_EVENTS
> +	select KVM_GMEM_SHARED_MEM
>   	help
>   	  Support hosting virtualized guest machines.
>   

Do we have to reject somewhere if we are given a guest_memfd that was 
*not* created using the SHARED flag? Or will existing checks already 
reject that?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 15/17] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM
  2025-05-13 16:34 ` [PATCH v9 15/17] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
  2025-05-21  2:46   ` Gavin Shan
@ 2025-05-21  8:06   ` David Hildenbrand
  1 sibling, 0 replies; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21  8:06 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 13.05.25 18:34, Fuad Tabba wrote:
> This patch introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> indicates that guest_memfd supports shared memory (when enabled by the
> flag). This support is limited to certain VM types, determined per
> architecture.
> 
> This patch also updates the KVM documentation with details on the new
> capability, flag, and other information about support for shared memory
> in guest_memfd.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   Documentation/virt/kvm/api.rst | 18 ++++++++++++++++++
>   include/uapi/linux/kvm.h       |  1 +
>   virt/kvm/kvm_main.c            |  4 ++++
>   3 files changed, 23 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 47c7c3f92314..86f74ce7f12a 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6390,6 +6390,24 @@ most one mapping per page, i.e. binding multiple memory regions to a single
>   guest_memfd range is not allowed (any number of memory regions can be bound to
>   a single guest_memfd file, but the bound ranges must not overlap).
>   
> +When the capability KVM_CAP_GMEM_SHARED_MEM is supported, the 'flags' field
> +supports GUEST_MEMFD_FLAG_SUPPORT_SHARED.  Setting this flag on guest_memfd
> +creation enables mmap() and faulting of guest_memfd memory to host userspace.
> +
> +When the KVM MMU performs a PFN lookup to service a guest fault and the backing
> +guest_memfd has the GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then the fault will
> +always be consumed from guest_memfd, regardless of whether it is a shared or a
> +private fault.
> +
> +For these memslots, userspace_addr is checked to be the mmap()-ed view of the
> +same range specified using gmem.pgoff.  Other accesses by KVM, e.g., instruction
> +emulation, go via slot->userspace_addr.  The slot->userspace_addr field can be
> +set to 0 to skip this check, which indicates that KVM would not access memory
> +belonging to the slot via its userspace_addr.
> +
> +The use of GUEST_MEMFD_FLAG_SUPPORT_SHARED will not be allowed for CoCo VMs.
> +This is validated when the guest_memfd instance is bound to the VM.
> +
>   See KVM_SET_USER_MEMORY_REGION2 for additional details.

With Gavin's comment addressed

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 15/17] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM
  2025-05-21  2:46   ` Gavin Shan
@ 2025-05-21  8:24     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-21  8:24 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Wed, 21 May 2025 at 03:47, Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Fuad,
>
> On 5/14/25 2:34 AM, Fuad Tabba wrote:
> > This patch introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> > indicates that guest_memfd supports shared memory (when enabled by the
> > flag). This support is limited to certain VM types, determined per
> > architecture.
> >
> > This patch also updates the KVM documentation with details on the new
> > capability, flag, and other information about support for shared memory
> > in guest_memfd.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   Documentation/virt/kvm/api.rst | 18 ++++++++++++++++++
> >   include/uapi/linux/kvm.h       |  1 +
> >   virt/kvm/kvm_main.c            |  4 ++++
> >   3 files changed, 23 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 47c7c3f92314..86f74ce7f12a 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6390,6 +6390,24 @@ most one mapping per page, i.e. binding multiple memory regions to a single
> >   guest_memfd range is not allowed (any number of memory regions can be bound to
> >   a single guest_memfd file, but the bound ranges must not overlap).
> >
> > +When the capability KVM_CAP_GMEM_SHARED_MEM is supported, the 'flags' field
> > +supports GUEST_MEMFD_FLAG_SUPPORT_SHARED.  Setting this flag on guest_memfd
> > +creation enables mmap() and faulting of guest_memfd memory to host userspace.
> > +
> > +When the KVM MMU performs a PFN lookup to service a guest fault and the backing
> > +guest_memfd has the GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then the fault will
> > +always be consumed from guest_memfd, regardless of whether it is a shared or a
> > +private fault.
> > +
> > +For these memslots, userspace_addr is checked to be the mmap()-ed view of the
> > +same range specified using gmem.pgoff.  Other accesses by KVM, e.g., instruction
> > +emulation, go via slot->userspace_addr.  The slot->userspace_addr field can be
> > +set to 0 to skip this check, which indicates that KVM would not access memory
> > +belonging to the slot via its userspace_addr.
> > +
>
> This paragraph needs to be removed if PATCH[08/17] is going to be dropped.

Done.

Thanks,
/fuad
>
> [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range
>
> > +The use of GUEST_MEMFD_FLAG_SUPPORT_SHARED will not be allowed for CoCo VMs.
> > +This is validated when the guest_memfd instance is bound to the VM.
> > +
> >   See KVM_SET_USER_MEMORY_REGION2 for additional details.
> >
>
> [...]
>
> Thanks,
> Gavin
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 16/17] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2025-05-21  6:53   ` Gavin Shan
@ 2025-05-21  9:38     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-21  9:38 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi Gavin,

On Wed, 21 May 2025 at 07:53, Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Fuad,
>
> On 5/14/25 2:34 AM, Fuad Tabba wrote:
> > Expand the guest_memfd selftests to include testing mapping guest
> > memory for VM types that support it.
> >
> > Also, build the guest_memfd selftest for arm64.
> >
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   tools/testing/selftests/kvm/Makefile.kvm      |   1 +
> >   .../testing/selftests/kvm/guest_memfd_test.c  | 145 +++++++++++++++---
> >   2 files changed, 126 insertions(+), 20 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> > index f62b0a5aba35..ccf95ed037c3 100644
> > --- a/tools/testing/selftests/kvm/Makefile.kvm
> > +++ b/tools/testing/selftests/kvm/Makefile.kvm
> > @@ -163,6 +163,7 @@ TEST_GEN_PROGS_arm64 += access_tracking_perf_test
> >   TEST_GEN_PROGS_arm64 += arch_timer
> >   TEST_GEN_PROGS_arm64 += coalesced_io_test
> >   TEST_GEN_PROGS_arm64 += dirty_log_perf_test
> > +TEST_GEN_PROGS_arm64 += guest_memfd_test
> >   TEST_GEN_PROGS_arm64 += get-reg-list
> >   TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
> >   TEST_GEN_PROGS_arm64 += memslot_perf_test
> > diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> > index ce687f8d248f..443c49185543 100644
> > --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> > +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> > @@ -34,12 +34,46 @@ static void test_file_read_write(int fd)
> >                   "pwrite on a guest_mem fd should fail");
> >   }
> >
> > -static void test_mmap(int fd, size_t page_size)
> > +static void test_mmap_allowed(int fd, size_t page_size, size_t total_size)
> > +{
> > +     const char val = 0xaa;
> > +     char *mem;
> > +     size_t i;
> > +     int ret;
> > +
> > +     mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > +     TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass.");
> > +
> > +     memset(mem, val, total_size);
> > +     for (i = 0; i < total_size; i++)
> > +             TEST_ASSERT_EQ(mem[i], val);
> > +
> > +     ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
> > +                     page_size);
> > +     TEST_ASSERT(!ret, "fallocate the first page should succeed");
> > +
> > +     for (i = 0; i < page_size; i++)
> > +             TEST_ASSERT_EQ(mem[i], 0x00);
> > +     for (; i < total_size; i++)
> > +             TEST_ASSERT_EQ(mem[i], val);
> > +
> > +     memset(mem, val, total_size);
> > +     for (i = 0; i < total_size; i++)
> > +             TEST_ASSERT_EQ(mem[i], val);
> > +
>
> The last memset() and check the resident values look redudant because same
> test has been covered by the first memset(). If we really want to double
> confirm that the page-cache is writabble, it would be enough to cover the
> first page. Otherwise, I guess this hunk of code can be removed :)

My goal was to check that it is in fact writable, and that it stores
the expected value, after the punch_hole. I'll limit it to the first
page.

>
>         memset(mem, val, page_size);
>         for (i = 0; i < page_size; i++)
>                 TEST_ASSERT_EQ(mem[i], val);
>
> > +     ret = munmap(mem, total_size);
> > +     TEST_ASSERT(!ret, "munmap should succeed");
> > +}
> > +
> > +static void test_mmap_denied(int fd, size_t page_size, size_t total_size)
> >   {
> >       char *mem;
> >
> >       mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> >       TEST_ASSERT_EQ(mem, MAP_FAILED);
> > +
> > +     mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > +     TEST_ASSERT_EQ(mem, MAP_FAILED);
> >   }
> >
> >   static void test_file_size(int fd, size_t page_size, size_t total_size)
> > @@ -120,26 +154,19 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
> >       }
> >   }
> >
> > -static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
> > +static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
> > +                                               uint64_t guest_memfd_flags,
> > +                                               size_t page_size)
> >   {
> > -     size_t page_size = getpagesize();
> > -     uint64_t flag;
> >       size_t size;
> >       int fd;
> >
> >       for (size = 1; size < page_size; size++) {
> > -             fd = __vm_create_guest_memfd(vm, size, 0);
> > +             fd = __vm_create_guest_memfd(vm, size, guest_memfd_flags);
> >               TEST_ASSERT(fd == -1 && errno == EINVAL,
> >                           "guest_memfd() with non-page-aligned page size '0x%lx' should fail with EINVAL",
> >                           size);
> >       }
> > -
> > -     for (flag = BIT(0); flag; flag <<= 1) {
> > -             fd = __vm_create_guest_memfd(vm, page_size, flag);
> > -             TEST_ASSERT(fd == -1 && errno == EINVAL,
> > -                         "guest_memfd() with flag '0x%lx' should fail with EINVAL",
> > -                         flag);
> > -     }
> >   }
> >
> >   static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> > @@ -170,30 +197,108 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> >       close(fd1);
> >   }
> >
> > -int main(int argc, char *argv[])
> > +static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
> > +                        bool expect_mmap_allowed)
> >   {
> > -     size_t page_size;
> > +     struct kvm_vm *vm;
> >       size_t total_size;
> > +     size_t page_size;
> >       int fd;
> > -     struct kvm_vm *vm;
> >
> > -     TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> > +     if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type)))
> > +             return;
> >
>
> The check seems incorrect for aarch64 since 0 is always returned from
> kvm_check_cap() there. The test is skipped for VM_TYPE_DEFAULT on aarch64.
> So it would be something like below:
>
>         #define VM_TYPE_DEFAULT         0
>
>         if (vm_type != VM_TYPE_DEFAULT &&
>             !(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type)))
>                 return;

Ack.

Thanks for this, and for all the other reviews.

Cheers,
/fuad

> >       page_size = getpagesize();
> >       total_size = page_size * 4;
> >
> > -     vm = vm_create_barebones();
> > +     vm = vm_create_barebones_type(vm_type);
> >
> > -     test_create_guest_memfd_invalid(vm);
> >       test_create_guest_memfd_multiple(vm);
> > +     test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size);
> >
> > -     fd = vm_create_guest_memfd(vm, total_size, 0);
> > +     fd = vm_create_guest_memfd(vm, total_size, guest_memfd_flags);
> >
> >       test_file_read_write(fd);
> > -     test_mmap(fd, page_size);
> > +
> > +     if (expect_mmap_allowed)
> > +             test_mmap_allowed(fd, page_size, total_size);
> > +     else
> > +             test_mmap_denied(fd, page_size, total_size);
> > +
> >       test_file_size(fd, page_size, total_size);
> >       test_fallocate(fd, page_size, total_size);
> >       test_invalid_punch_hole(fd, page_size, total_size);
> >
> >       close(fd);
> > +     kvm_vm_release(vm);
> > +}
> > +
> > +static void test_vm_type_gmem_flag_validity(unsigned long vm_type,
> > +                                         uint64_t expected_valid_flags)
> > +{
> > +     size_t page_size = getpagesize();
> > +     struct kvm_vm *vm;
> > +     uint64_t flag = 0;
> > +     int fd;
> > +
> > +     if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type)))
> > +             return;
>
> Same as above

Ack.

> > +
> > +     vm = vm_create_barebones_type(vm_type);
> > +
> > +     for (flag = BIT(0); flag; flag <<= 1) {
> > +             fd = __vm_create_guest_memfd(vm, page_size, flag);
> > +
> > +             if (flag & expected_valid_flags) {
> > +                     TEST_ASSERT(fd > 0,
> > +                                 "guest_memfd() with flag '0x%lx' should be valid",
> > +                                 flag);
> > +                     close(fd);
> > +             } else {
> > +                     TEST_ASSERT(fd == -1 && errno == EINVAL,
> > +                                 "guest_memfd() with flag '0x%lx' should fail with EINVAL",
> > +                                 flag);
>
> It's more robust to have:
>
>                         TEST_ASSERT(fd < 0 && errno == EINVAL, ...);

Ack.

> > +             }
> > +     }
> > +
> > +     kvm_vm_release(vm);
> > +}
> > +
> > +static void test_gmem_flag_validity(void)
> > +{
> > +     uint64_t non_coco_vm_valid_flags = 0;
> > +
> > +     if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
> > +             non_coco_vm_valid_flags = GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +
> > +     test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, non_coco_vm_valid_flags);
> > +
> > +#ifdef __x86_64__
> > +     test_vm_type_gmem_flag_validity(KVM_X86_SW_PROTECTED_VM, non_coco_vm_valid_flags);
> > +     test_vm_type_gmem_flag_validity(KVM_X86_SEV_VM, 0);
> > +     test_vm_type_gmem_flag_validity(KVM_X86_SEV_ES_VM, 0);
> > +     test_vm_type_gmem_flag_validity(KVM_X86_SNP_VM, 0);
> > +     test_vm_type_gmem_flag_validity(KVM_X86_TDX_VM, 0);
> > +#endif
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +     TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> > +
> > +     test_gmem_flag_validity();
> > +
> > +     test_with_type(VM_TYPE_DEFAULT, 0, false);
> > +     if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
> > +             test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_SUPPORT_SHARED,
> > +                            true);
> > +     }
> > +
> > +#ifdef __x86_64__
> > +     test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
> > +     if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
> > +             test_with_type(KVM_X86_SW_PROTECTED_VM,
> > +                            GUEST_MEMFD_FLAG_SUPPORT_SHARED, true);
> > +     }
> > +#endif
> >   }
>
> Thanks,
> Gavin
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 12/17] KVM: arm64: Rename variables in user_mem_abort()
  2025-05-21  2:25   ` Gavin Shan
@ 2025-05-21  9:57     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-21  9:57 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi Gavin,

On Wed, 21 May 2025 at 03:25, Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Fuad,
>
> On 5/14/25 2:34 AM, Fuad Tabba wrote:
> > Guest memory can be backed by guest_memfd or by anonymous memory. Rename
> > vma_shift to page_shift and vma_pagesize to page_size to ease
> > readability in subsequent patches.
> >
> > Suggested-by: James Houghton <jthoughton@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   arch/arm64/kvm/mmu.c | 54 ++++++++++++++++++++++----------------------
> >   1 file changed, 27 insertions(+), 27 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 9865ada04a81..d756c2b5913f 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1479,13 +1479,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >       phys_addr_t ipa = fault_ipa;
> >       struct kvm *kvm = vcpu->kvm;
> >       struct vm_area_struct *vma;
> > -     short vma_shift;
> > +     short page_shift;
> >       void *memcache;
> >       gfn_t gfn;
> >       kvm_pfn_t pfn;
> >       bool logging_active = memslot_is_logging(memslot);
> >       bool force_pte = logging_active || is_protected_kvm_enabled();
> > -     long vma_pagesize, fault_granule;
> > +     long page_size, fault_granule;
> >       enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> >       struct kvm_pgtable *pgt;
> >       struct page *page;
>
> [...]
>
> >
> >       /*
> > @@ -1600,9 +1600,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >        * ensure we find the right PFN and lay down the mapping in the right
> >        * place.
> >        */
> > -     if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
> > -             fault_ipa &= ~(vma_pagesize - 1);
> > -             ipa &= ~(vma_pagesize - 1);
> > +     if (page_size == PMD_SIZE || page_size == PUD_SIZE) {
> > +             fault_ipa &= ~(page_size - 1);
> > +             ipa &= ~(page_size - 1);
> >       }
> >
>
> nit: since we're here for readability, ALIGN_DOWN() may be used:
>
>                 fault_ipa = ALIGN_DOWN(fault_ipa, page_size);
>                 ipa = ALIGN_DOWN(ipa, page_size);

Yes, but that would be a change beyond just renaming. I could
introduce another patch to do that.

Thanks,
/fuad

> Thanks,
> Gavin
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-21  8:05   ` David Hildenbrand
@ 2025-05-21 10:12     ` Fuad Tabba
  2025-05-21 10:26       ` David Hildenbrand
  0 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-21 10:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi David,

On Wed, 21 May 2025 at 09:05, David Hildenbrand <david@redhat.com> wrote:
>
> On 13.05.25 18:34, Fuad Tabba wrote:
> > Enable mapping guest_memfd in arm64. For now, it applies to all
> > VMs in arm64 that use guest_memfd. In the future, new VM types
> > can restrict this via kvm_arch_gmem_supports_shared_mem().
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
> >   arch/arm64/kvm/Kconfig            |  1 +
> >   2 files changed, 11 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 08ba91e6fb03..2514779f5131 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
> >       return true;
> >   }
> >
> > +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> > +{
> > +     return IS_ENABLED(CONFIG_KVM_GMEM);
> > +}
> > +
> > +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> > +{
> > +     return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
> > +}
> > +
> >   #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 096e45acadb2..8c1e1964b46a 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -38,6 +38,7 @@ menuconfig KVM
> >       select HAVE_KVM_VCPU_RUN_PID_CHANGE
> >       select SCHED_INFO
> >       select GUEST_PERF_EVENTS if PERF_EVENTS
> > +     select KVM_GMEM_SHARED_MEM
> >       help
> >         Support hosting virtualized guest machines.
> >
>
> Do we have to reject somewhere if we are given a guest_memfd that was
> *not* created using the SHARED flag? Or will existing checks already
> reject that?

We don't reject, but I don't think we need to. A user can create a
guest_memfd that's private in arm64, it would just be useless.

Cheers,
/fuad
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-21 10:12     ` Fuad Tabba
@ 2025-05-21 10:26       ` David Hildenbrand
  2025-05-21 10:29         ` Fuad Tabba
  0 siblings, 1 reply; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21 10:26 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 21.05.25 12:12, Fuad Tabba wrote:
> Hi David,
> 
> On Wed, 21 May 2025 at 09:05, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 13.05.25 18:34, Fuad Tabba wrote:
>>> Enable mapping guest_memfd in arm64. For now, it applies to all
>>> VMs in arm64 that use guest_memfd. In the future, new VM types
>>> can restrict this via kvm_arch_gmem_supports_shared_mem().
>>>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>>    arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
>>>    arch/arm64/kvm/Kconfig            |  1 +
>>>    2 files changed, 11 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 08ba91e6fb03..2514779f5131 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
>>>        return true;
>>>    }
>>>
>>> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>>> +{
>>> +     return IS_ENABLED(CONFIG_KVM_GMEM);
>>> +}
>>> +
>>> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
>>> +{
>>> +     return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
>>> +}
>>> +
>>>    #endif /* __ARM64_KVM_HOST_H__ */
>>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>>> index 096e45acadb2..8c1e1964b46a 100644
>>> --- a/arch/arm64/kvm/Kconfig
>>> +++ b/arch/arm64/kvm/Kconfig
>>> @@ -38,6 +38,7 @@ menuconfig KVM
>>>        select HAVE_KVM_VCPU_RUN_PID_CHANGE
>>>        select SCHED_INFO
>>>        select GUEST_PERF_EVENTS if PERF_EVENTS
>>> +     select KVM_GMEM_SHARED_MEM
>>>        help
>>>          Support hosting virtualized guest machines.
>>>
>>
>> Do we have to reject somewhere if we are given a guest_memfd that was
>> *not* created using the SHARED flag? Or will existing checks already
>> reject that?
> 
> We don't reject, but I don't think we need to. A user can create a
> guest_memfd that's private in arm64, it would just be useless.

But the arm64 fault routine would not be able to handle that properly, no?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-21 10:26       ` David Hildenbrand
@ 2025-05-21 10:29         ` Fuad Tabba
  2025-05-21 12:44           ` David Hildenbrand
  0 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-21 10:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Wed, 21 May 2025 at 11:26, David Hildenbrand <david@redhat.com> wrote:
>
> On 21.05.25 12:12, Fuad Tabba wrote:
> > Hi David,
> >
> > On Wed, 21 May 2025 at 09:05, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 13.05.25 18:34, Fuad Tabba wrote:
> >>> Enable mapping guest_memfd in arm64. For now, it applies to all
> >>> VMs in arm64 that use guest_memfd. In the future, new VM types
> >>> can restrict this via kvm_arch_gmem_supports_shared_mem().
> >>>
> >>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>> ---
> >>>    arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
> >>>    arch/arm64/kvm/Kconfig            |  1 +
> >>>    2 files changed, 11 insertions(+)
> >>>
> >>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>> index 08ba91e6fb03..2514779f5131 100644
> >>> --- a/arch/arm64/include/asm/kvm_host.h
> >>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>> @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
> >>>        return true;
> >>>    }
> >>>
> >>> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> >>> +{
> >>> +     return IS_ENABLED(CONFIG_KVM_GMEM);
> >>> +}
> >>> +
> >>> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> >>> +{
> >>> +     return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
> >>> +}
> >>> +
> >>>    #endif /* __ARM64_KVM_HOST_H__ */
> >>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> >>> index 096e45acadb2..8c1e1964b46a 100644
> >>> --- a/arch/arm64/kvm/Kconfig
> >>> +++ b/arch/arm64/kvm/Kconfig
> >>> @@ -38,6 +38,7 @@ menuconfig KVM
> >>>        select HAVE_KVM_VCPU_RUN_PID_CHANGE
> >>>        select SCHED_INFO
> >>>        select GUEST_PERF_EVENTS if PERF_EVENTS
> >>> +     select KVM_GMEM_SHARED_MEM
> >>>        help
> >>>          Support hosting virtualized guest machines.
> >>>
> >>
> >> Do we have to reject somewhere if we are given a guest_memfd that was
> >> *not* created using the SHARED flag? Or will existing checks already
> >> reject that?
> >
> > We don't reject, but I don't think we need to. A user can create a
> > guest_memfd that's private in arm64, it would just be useless.
>
> But the arm64 fault routine would not be able to handle that properly, no?

Actually it would. The function user_mem_abort() doesn't care whether
it's private or shared. It would fault it into the guest correctly
regardless.

Thanks,
/fuad

> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 13/17] KVM: arm64: Handle guest_memfd()-backed guest page faults
  2025-05-21  8:04   ` David Hildenbrand
@ 2025-05-21 11:10     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-21 11:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi David,

On Wed, 21 May 2025 at 09:04, David Hildenbrand <david@redhat.com> wrote:
>
> On 13.05.25 18:34, Fuad Tabba wrote:
> > Add arm64 support for handling guest page faults on guest_memfd
> > backed memslots.
> >
> > For now, the fault granule is restricted to PAGE_SIZE.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
>
> [...]
>
> > +     if (!is_gmem) {
>
> Should we add a comment somewhere, stating that we don't support VMs
> with private memory, so if we have a gmem, all faults are routed through
> that?

I guess this is related to the other thread we had. This would handle
private memory correctly. It's just that for arm64 as it is, having
private memory isn't that useful.

There might be a use-case where a user would create a
guest_memfd-backed slot that supports private memory, and one that
doesn't, which only the guest would use. I doubt that that's actually
useful, but it would work and behave as expected.

Cheers,
/fuad

> > +             mmap_read_lock(current->mm);
> > +             vma = vma_lookup(current->mm, hva);
> > +             if (unlikely(!vma)) {
> > +                     kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> > +                     mmap_read_unlock(current->mm);
> > +                     return -EFAULT;
> > +             }
> > +
> > +             vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> > +             mte_allowed = kvm_vma_mte_allowed(vma);
>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-21 10:29         ` Fuad Tabba
@ 2025-05-21 12:44           ` David Hildenbrand
  2025-05-21 13:15             ` Fuad Tabba
  0 siblings, 1 reply; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21 12:44 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 21.05.25 12:29, Fuad Tabba wrote:
> On Wed, 21 May 2025 at 11:26, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 21.05.25 12:12, Fuad Tabba wrote:
>>> Hi David,
>>>
>>> On Wed, 21 May 2025 at 09:05, David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 13.05.25 18:34, Fuad Tabba wrote:
>>>>> Enable mapping guest_memfd in arm64. For now, it applies to all
>>>>> VMs in arm64 that use guest_memfd. In the future, new VM types
>>>>> can restrict this via kvm_arch_gmem_supports_shared_mem().
>>>>>
>>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>>>> ---
>>>>>     arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
>>>>>     arch/arm64/kvm/Kconfig            |  1 +
>>>>>     2 files changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>>> index 08ba91e6fb03..2514779f5131 100644
>>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>>> @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
>>>>>         return true;
>>>>>     }
>>>>>
>>>>> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>>>>> +{
>>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM);
>>>>> +}
>>>>> +
>>>>> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
>>>>> +{
>>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
>>>>> +}
>>>>> +
>>>>>     #endif /* __ARM64_KVM_HOST_H__ */
>>>>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>>>>> index 096e45acadb2..8c1e1964b46a 100644
>>>>> --- a/arch/arm64/kvm/Kconfig
>>>>> +++ b/arch/arm64/kvm/Kconfig
>>>>> @@ -38,6 +38,7 @@ menuconfig KVM
>>>>>         select HAVE_KVM_VCPU_RUN_PID_CHANGE
>>>>>         select SCHED_INFO
>>>>>         select GUEST_PERF_EVENTS if PERF_EVENTS
>>>>> +     select KVM_GMEM_SHARED_MEM
>>>>>         help
>>>>>           Support hosting virtualized guest machines.
>>>>>
>>>>
>>>> Do we have to reject somewhere if we are given a guest_memfd that was
>>>> *not* created using the SHARED flag? Or will existing checks already
>>>> reject that?
>>>
>>> We don't reject, but I don't think we need to. A user can create a
>>> guest_memfd that's private in arm64, it would just be useless.
>>
>> But the arm64 fault routine would not be able to handle that properly, no?
> 
> Actually it would. The function user_mem_abort() doesn't care whether
> it's private or shared. It would fault it into the guest correctly
> regardless.


I think what I meant is that: if it's !shared (private only), shared 
accesses (IOW all access without CoCo) should be taken from the user 
space mapping.

But user_mem_abort() would blindly go to kvm_gmem_get_pfn() because 
"is_gmem = kvm_slot_has_gmem(memslot) = true".

In other words, arm64 would have to *ignore* guest_memfd that does not 
support shared?

That's why I was wondering whether we should just immediately refuse 
such guest_memfds.


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-21 12:44           ` David Hildenbrand
@ 2025-05-21 13:15             ` Fuad Tabba
  2025-05-21 13:21               ` David Hildenbrand
  0 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-21 13:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi David,

On Wed, 21 May 2025 at 13:44, David Hildenbrand <david@redhat.com> wrote:
>
> On 21.05.25 12:29, Fuad Tabba wrote:
> > On Wed, 21 May 2025 at 11:26, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 21.05.25 12:12, Fuad Tabba wrote:
> >>> Hi David,
> >>>
> >>> On Wed, 21 May 2025 at 09:05, David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 13.05.25 18:34, Fuad Tabba wrote:
> >>>>> Enable mapping guest_memfd in arm64. For now, it applies to all
> >>>>> VMs in arm64 that use guest_memfd. In the future, new VM types
> >>>>> can restrict this via kvm_arch_gmem_supports_shared_mem().
> >>>>>
> >>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>>>> ---
> >>>>>     arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
> >>>>>     arch/arm64/kvm/Kconfig            |  1 +
> >>>>>     2 files changed, 11 insertions(+)
> >>>>>
> >>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>>>> index 08ba91e6fb03..2514779f5131 100644
> >>>>> --- a/arch/arm64/include/asm/kvm_host.h
> >>>>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>>>> @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
> >>>>>         return true;
> >>>>>     }
> >>>>>
> >>>>> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> >>>>> +{
> >>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM);
> >>>>> +}
> >>>>> +
> >>>>> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> >>>>> +{
> >>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
> >>>>> +}
> >>>>> +
> >>>>>     #endif /* __ARM64_KVM_HOST_H__ */
> >>>>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> >>>>> index 096e45acadb2..8c1e1964b46a 100644
> >>>>> --- a/arch/arm64/kvm/Kconfig
> >>>>> +++ b/arch/arm64/kvm/Kconfig
> >>>>> @@ -38,6 +38,7 @@ menuconfig KVM
> >>>>>         select HAVE_KVM_VCPU_RUN_PID_CHANGE
> >>>>>         select SCHED_INFO
> >>>>>         select GUEST_PERF_EVENTS if PERF_EVENTS
> >>>>> +     select KVM_GMEM_SHARED_MEM
> >>>>>         help
> >>>>>           Support hosting virtualized guest machines.
> >>>>>
> >>>>
> >>>> Do we have to reject somewhere if we are given a guest_memfd that was
> >>>> *not* created using the SHARED flag? Or will existing checks already
> >>>> reject that?
> >>>
> >>> We don't reject, but I don't think we need to. A user can create a
> >>> guest_memfd that's private in arm64, it would just be useless.
> >>
> >> But the arm64 fault routine would not be able to handle that properly, no?
> >
> > Actually it would. The function user_mem_abort() doesn't care whether
> > it's private or shared. It would fault it into the guest correctly
> > regardless.
>
>
> I think what I meant is that: if it's !shared (private only), shared
> accesses (IOW all access without CoCo) should be taken from the user
> space mapping.
>
> But user_mem_abort() would blindly go to kvm_gmem_get_pfn() because
> "is_gmem = kvm_slot_has_gmem(memslot) = true".

Yes, since it is a gmem-backed slot.

> In other words, arm64 would have to *ignore* guest_memfd that does not
> support shared?
>
> That's why I was wondering whether we should just immediately refuse
> such guest_memfds.

My thinking is that if a user deliberately creates a
guest_memfd-backed slot without designating it as being sharable, then
either they would find out when they try to map that memory to the
host userspace (mapping it would fail), or it could be that they
deliberately want to set up a VM with memslots that not mappable at
all by the host. Perhaps to add some layer of security (although a
very flimsy one, since it's not a confidential guest).

I'm happy to a check to prevent this. The question is, how to do it
exactly (I assume it would be in kvm_gmem_create())? Would it be
arch-specific, i.e., prevent arm64 from creating non-shared
guest_memfd backed memslots? Or do it by VM type? Even if we do it by
VM-type it would need to be arch-specific, since we allow private
guest_memfd slots for the default VM in x86, but we wouldn't for
arm64.

We could add another function, along the lines of
kvm_arch_supports_gmem_only_shared_mem(), but considering that it
actually works, and (arguably) would behave as intended, I'm not sure
if it's worth the complexity.

What do you think?

Cheers,
/fuad

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-21 13:15             ` Fuad Tabba
@ 2025-05-21 13:21               ` David Hildenbrand
  2025-05-21 13:32                 ` Fuad Tabba
  0 siblings, 1 reply; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21 13:21 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 21.05.25 15:15, Fuad Tabba wrote:
> Hi David,
> 
> On Wed, 21 May 2025 at 13:44, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 21.05.25 12:29, Fuad Tabba wrote:
>>> On Wed, 21 May 2025 at 11:26, David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 21.05.25 12:12, Fuad Tabba wrote:
>>>>> Hi David,
>>>>>
>>>>> On Wed, 21 May 2025 at 09:05, David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>> On 13.05.25 18:34, Fuad Tabba wrote:
>>>>>>> Enable mapping guest_memfd in arm64. For now, it applies to all
>>>>>>> VMs in arm64 that use guest_memfd. In the future, new VM types
>>>>>>> can restrict this via kvm_arch_gmem_supports_shared_mem().
>>>>>>>
>>>>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>>>>>> ---
>>>>>>>      arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
>>>>>>>      arch/arm64/kvm/Kconfig            |  1 +
>>>>>>>      2 files changed, 11 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>>>>> index 08ba91e6fb03..2514779f5131 100644
>>>>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>>>>> @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
>>>>>>>          return true;
>>>>>>>      }
>>>>>>>
>>>>>>> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>>>>>>> +{
>>>>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
>>>>>>> +{
>>>>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
>>>>>>> +}
>>>>>>> +
>>>>>>>      #endif /* __ARM64_KVM_HOST_H__ */
>>>>>>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>>>>>>> index 096e45acadb2..8c1e1964b46a 100644
>>>>>>> --- a/arch/arm64/kvm/Kconfig
>>>>>>> +++ b/arch/arm64/kvm/Kconfig
>>>>>>> @@ -38,6 +38,7 @@ menuconfig KVM
>>>>>>>          select HAVE_KVM_VCPU_RUN_PID_CHANGE
>>>>>>>          select SCHED_INFO
>>>>>>>          select GUEST_PERF_EVENTS if PERF_EVENTS
>>>>>>> +     select KVM_GMEM_SHARED_MEM
>>>>>>>          help
>>>>>>>            Support hosting virtualized guest machines.
>>>>>>>
>>>>>>
>>>>>> Do we have to reject somewhere if we are given a guest_memfd that was
>>>>>> *not* created using the SHARED flag? Or will existing checks already
>>>>>> reject that?
>>>>>
>>>>> We don't reject, but I don't think we need to. A user can create a
>>>>> guest_memfd that's private in arm64, it would just be useless.
>>>>
>>>> But the arm64 fault routine would not be able to handle that properly, no?
>>>
>>> Actually it would. The function user_mem_abort() doesn't care whether
>>> it's private or shared. It would fault it into the guest correctly
>>> regardless.
>>
>>
>> I think what I meant is that: if it's !shared (private only), shared
>> accesses (IOW all access without CoCo) should be taken from the user
>> space mapping.
>>
>> But user_mem_abort() would blindly go to kvm_gmem_get_pfn() because
>> "is_gmem = kvm_slot_has_gmem(memslot) = true".
> 
> Yes, since it is a gmem-backed slot.
> 
>> In other words, arm64 would have to *ignore* guest_memfd that does not
>> support shared?
>>
>> That's why I was wondering whether we should just immediately refuse
>> such guest_memfds.
> 
> My thinking is that if a user deliberately creates a
> guest_memfd-backed slot without designating it as being sharable, then
> either they would find out when they try to map that memory to the
> host userspace (mapping it would fail), or it could be that they
> deliberately want to set up a VM with memslots that not mappable at
> all by the host. 

Hm. But that would meant that we interpret "private" memory as a concept 
that is not understood by the VM. Because the VM does not know what 
"private" memory is ...

> Perhaps to add some layer of security (although a
> very flimsy one, since it's not a confidential guest).

Exactly my point. If you don't want to mmap it then ... don't mmap it :)

> 
> I'm happy to a check to prevent this. The question is, how to do it
> exactly (I assume it would be in kvm_gmem_create())? Would it be
> arch-specific, i.e., prevent arm64 from creating non-shared
> guest_memfd backed memslots? Or do it by VM type? Even if we do it by
> VM-type it would need to be arch-specific, since we allow private
> guest_memfd slots for the default VM in x86, but we wouldn't for
> arm64.
> 
> We could add another function, along the lines of
> kvm_arch_supports_gmem_only_shared_mem(), but considering that it
> actually works, and (arguably) would behave as intended, I'm not sure
> if it's worth the complexity.
> 
> What do you think?

My thinking was to either block this at slot creation time or at 
guest_memfd creation time. And we should probably block that for other 
VM types as well that do not support private memory?

I mean, creating guest_memfd for private memory when there is no concept 
of private memory for the VM is ... weird, no? :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-21 13:21               ` David Hildenbrand
@ 2025-05-21 13:32                 ` Fuad Tabba
  2025-05-21 13:45                   ` David Hildenbrand
  0 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-21 13:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi David,

On Wed, 21 May 2025 at 14:22, David Hildenbrand <david@redhat.com> wrote:
>
> On 21.05.25 15:15, Fuad Tabba wrote:
> > Hi David,
> >
> > On Wed, 21 May 2025 at 13:44, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 21.05.25 12:29, Fuad Tabba wrote:
> >>> On Wed, 21 May 2025 at 11:26, David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 21.05.25 12:12, Fuad Tabba wrote:
> >>>>> Hi David,
> >>>>>
> >>>>> On Wed, 21 May 2025 at 09:05, David Hildenbrand <david@redhat.com> wrote:
> >>>>>>
> >>>>>> On 13.05.25 18:34, Fuad Tabba wrote:
> >>>>>>> Enable mapping guest_memfd in arm64. For now, it applies to all
> >>>>>>> VMs in arm64 that use guest_memfd. In the future, new VM types
> >>>>>>> can restrict this via kvm_arch_gmem_supports_shared_mem().
> >>>>>>>
> >>>>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>>>>>> ---
> >>>>>>>      arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
> >>>>>>>      arch/arm64/kvm/Kconfig            |  1 +
> >>>>>>>      2 files changed, 11 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>>>>>> index 08ba91e6fb03..2514779f5131 100644
> >>>>>>> --- a/arch/arm64/include/asm/kvm_host.h
> >>>>>>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>>>>>> @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
> >>>>>>>          return true;
> >>>>>>>      }
> >>>>>>>
> >>>>>>> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> >>>>>>> +{
> >>>>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> >>>>>>> +{
> >>>>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>      #endif /* __ARM64_KVM_HOST_H__ */
> >>>>>>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> >>>>>>> index 096e45acadb2..8c1e1964b46a 100644
> >>>>>>> --- a/arch/arm64/kvm/Kconfig
> >>>>>>> +++ b/arch/arm64/kvm/Kconfig
> >>>>>>> @@ -38,6 +38,7 @@ menuconfig KVM
> >>>>>>>          select HAVE_KVM_VCPU_RUN_PID_CHANGE
> >>>>>>>          select SCHED_INFO
> >>>>>>>          select GUEST_PERF_EVENTS if PERF_EVENTS
> >>>>>>> +     select KVM_GMEM_SHARED_MEM
> >>>>>>>          help
> >>>>>>>            Support hosting virtualized guest machines.
> >>>>>>>
> >>>>>>
> >>>>>> Do we have to reject somewhere if we are given a guest_memfd that was
> >>>>>> *not* created using the SHARED flag? Or will existing checks already
> >>>>>> reject that?
> >>>>>
> >>>>> We don't reject, but I don't think we need to. A user can create a
> >>>>> guest_memfd that's private in arm64, it would just be useless.
> >>>>
> >>>> But the arm64 fault routine would not be able to handle that properly, no?
> >>>
> >>> Actually it would. The function user_mem_abort() doesn't care whether
> >>> it's private or shared. It would fault it into the guest correctly
> >>> regardless.
> >>
> >>
> >> I think what I meant is that: if it's !shared (private only), shared
> >> accesses (IOW all access without CoCo) should be taken from the user
> >> space mapping.
> >>
> >> But user_mem_abort() would blindly go to kvm_gmem_get_pfn() because
> >> "is_gmem = kvm_slot_has_gmem(memslot) = true".
> >
> > Yes, since it is a gmem-backed slot.
> >
> >> In other words, arm64 would have to *ignore* guest_memfd that does not
> >> support shared?
> >>
> >> That's why I was wondering whether we should just immediately refuse
> >> such guest_memfds.
> >
> > My thinking is that if a user deliberately creates a
> > guest_memfd-backed slot without designating it as being sharable, then
> > either they would find out when they try to map that memory to the
> > host userspace (mapping it would fail), or it could be that they
> > deliberately want to set up a VM with memslots that not mappable at
> > all by the host.
>
> Hm. But that would meant that we interpret "private" memory as a concept
> that is not understood by the VM. Because the VM does not know what
> "private" memory is ...
>
> > Perhaps to add some layer of security (although a
> > very flimsy one, since it's not a confidential guest).
>
> Exactly my point. If you don't want to mmap it then ... don't mmap it :)
>
> >
> > I'm happy to a check to prevent this. The question is, how to do it
> > exactly (I assume it would be in kvm_gmem_create())? Would it be
> > arch-specific, i.e., prevent arm64 from creating non-shared
> > guest_memfd backed memslots? Or do it by VM type? Even if we do it by
> > VM-type it would need to be arch-specific, since we allow private
> > guest_memfd slots for the default VM in x86, but we wouldn't for
> > arm64.
> >
> > We could add another function, along the lines of
> > kvm_arch_supports_gmem_only_shared_mem(), but considering that it
> > actually works, and (arguably) would behave as intended, I'm not sure
> > if it's worth the complexity.
> >
> > What do you think?
>
> My thinking was to either block this at slot creation time or at
> guest_memfd creation time. And we should probably block that for other
> VM types as well that do not support private memory?
>
> I mean, creating guest_memfd for private memory when there is no concept
> of private memory for the VM is ... weird, no? :)

Actually, I could add this as an arch-specific check in
arch/arm64/kvm/mmu.c:kvm_arch_prepare_memory_region(). That way, core
KVM/guest_memfd code doesn't need to handle this arm64-specific behavior.

Does that sound good?

Thanks,
/fuad


> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-21 13:32                 ` Fuad Tabba
@ 2025-05-21 13:45                   ` David Hildenbrand
  2025-05-21 14:14                     ` Fuad Tabba
  0 siblings, 1 reply; 88+ messages in thread
From: David Hildenbrand @ 2025-05-21 13:45 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 21.05.25 15:32, Fuad Tabba wrote:
> Hi David,
> 
> On Wed, 21 May 2025 at 14:22, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 21.05.25 15:15, Fuad Tabba wrote:
>>> Hi David,
>>>
>>> On Wed, 21 May 2025 at 13:44, David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 21.05.25 12:29, Fuad Tabba wrote:
>>>>> On Wed, 21 May 2025 at 11:26, David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>> On 21.05.25 12:12, Fuad Tabba wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> On Wed, 21 May 2025 at 09:05, David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On 13.05.25 18:34, Fuad Tabba wrote:
>>>>>>>>> Enable mapping guest_memfd in arm64. For now, it applies to all
>>>>>>>>> VMs in arm64 that use guest_memfd. In the future, new VM types
>>>>>>>>> can restrict this via kvm_arch_gmem_supports_shared_mem().
>>>>>>>>>
>>>>>>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>>>>>>>> ---
>>>>>>>>>       arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
>>>>>>>>>       arch/arm64/kvm/Kconfig            |  1 +
>>>>>>>>>       2 files changed, 11 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>>>>>>> index 08ba91e6fb03..2514779f5131 100644
>>>>>>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>>>>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>>>>>>> @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
>>>>>>>>>           return true;
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>>>>>>>>> +{
>>>>>>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
>>>>>>>>> +{
>>>>>>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>       #endif /* __ARM64_KVM_HOST_H__ */
>>>>>>>>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>>>>>>>>> index 096e45acadb2..8c1e1964b46a 100644
>>>>>>>>> --- a/arch/arm64/kvm/Kconfig
>>>>>>>>> +++ b/arch/arm64/kvm/Kconfig
>>>>>>>>> @@ -38,6 +38,7 @@ menuconfig KVM
>>>>>>>>>           select HAVE_KVM_VCPU_RUN_PID_CHANGE
>>>>>>>>>           select SCHED_INFO
>>>>>>>>>           select GUEST_PERF_EVENTS if PERF_EVENTS
>>>>>>>>> +     select KVM_GMEM_SHARED_MEM
>>>>>>>>>           help
>>>>>>>>>             Support hosting virtualized guest machines.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Do we have to reject somewhere if we are given a guest_memfd that was
>>>>>>>> *not* created using the SHARED flag? Or will existing checks already
>>>>>>>> reject that?
>>>>>>>
>>>>>>> We don't reject, but I don't think we need to. A user can create a
>>>>>>> guest_memfd that's private in arm64, it would just be useless.
>>>>>>
>>>>>> But the arm64 fault routine would not be able to handle that properly, no?
>>>>>
>>>>> Actually it would. The function user_mem_abort() doesn't care whether
>>>>> it's private or shared. It would fault it into the guest correctly
>>>>> regardless.
>>>>
>>>>
>>>> I think what I meant is that: if it's !shared (private only), shared
>>>> accesses (IOW all access without CoCo) should be taken from the user
>>>> space mapping.
>>>>
>>>> But user_mem_abort() would blindly go to kvm_gmem_get_pfn() because
>>>> "is_gmem = kvm_slot_has_gmem(memslot) = true".
>>>
>>> Yes, since it is a gmem-backed slot.
>>>
>>>> In other words, arm64 would have to *ignore* guest_memfd that does not
>>>> support shared?
>>>>
>>>> That's why I was wondering whether we should just immediately refuse
>>>> such guest_memfds.
>>>
>>> My thinking is that if a user deliberately creates a
>>> guest_memfd-backed slot without designating it as being sharable, then
>>> either they would find out when they try to map that memory to the
>>> host userspace (mapping it would fail), or it could be that they
>>> deliberately want to set up a VM with memslots that not mappable at
>>> all by the host.
>>
>> Hm. But that would meant that we interpret "private" memory as a concept
>> that is not understood by the VM. Because the VM does not know what
>> "private" memory is ...
>>
>>> Perhaps to add some layer of security (although a
>>> very flimsy one, since it's not a confidential guest).
>>
>> Exactly my point. If you don't want to mmap it then ... don't mmap it :)
>>
>>>
>>> I'm happy to a check to prevent this. The question is, how to do it
>>> exactly (I assume it would be in kvm_gmem_create())? Would it be
>>> arch-specific, i.e., prevent arm64 from creating non-shared
>>> guest_memfd backed memslots? Or do it by VM type? Even if we do it by
>>> VM-type it would need to be arch-specific, since we allow private
>>> guest_memfd slots for the default VM in x86, but we wouldn't for
>>> arm64.
>>>
>>> We could add another function, along the lines of
>>> kvm_arch_supports_gmem_only_shared_mem(), but considering that it
>>> actually works, and (arguably) would behave as intended, I'm not sure
>>> if it's worth the complexity.
>>>
>>> What do you think?
>>
>> My thinking was to either block this at slot creation time or at
>> guest_memfd creation time. And we should probably block that for other
>> VM types as well that do not support private memory?
>>
>> I mean, creating guest_memfd for private memory when there is no concept
>> of private memory for the VM is ... weird, no? :)
> 
> Actually, I could add this as an arch-specific check in
> arch/arm64/kvm/mmu.c:kvm_arch_prepare_memory_region(). That way, core
> KVM/guest_memfd code doesn't need to handle this arm64-specific behavior.
> 
> Does that sound good?

Yes, but only do so if you agree.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-21 13:45                   ` David Hildenbrand
@ 2025-05-21 14:14                     ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-21 14:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Wed, 21 May 2025 at 14:45, David Hildenbrand <david@redhat.com> wrote:
>
> On 21.05.25 15:32, Fuad Tabba wrote:
> > Hi David,
> >
> > On Wed, 21 May 2025 at 14:22, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 21.05.25 15:15, Fuad Tabba wrote:
> >>> Hi David,
> >>>
> >>> On Wed, 21 May 2025 at 13:44, David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 21.05.25 12:29, Fuad Tabba wrote:
> >>>>> On Wed, 21 May 2025 at 11:26, David Hildenbrand <david@redhat.com> wrote:
> >>>>>>
> >>>>>> On 21.05.25 12:12, Fuad Tabba wrote:
> >>>>>>> Hi David,
> >>>>>>>
> >>>>>>> On Wed, 21 May 2025 at 09:05, David Hildenbrand <david@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>> On 13.05.25 18:34, Fuad Tabba wrote:
> >>>>>>>>> Enable mapping guest_memfd in arm64. For now, it applies to all
> >>>>>>>>> VMs in arm64 that use guest_memfd. In the future, new VM types
> >>>>>>>>> can restrict this via kvm_arch_gmem_supports_shared_mem().
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>>>>>>>> ---
> >>>>>>>>>       arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
> >>>>>>>>>       arch/arm64/kvm/Kconfig            |  1 +
> >>>>>>>>>       2 files changed, 11 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>>>>>>>> index 08ba91e6fb03..2514779f5131 100644
> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_host.h
> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>>>>>>>> @@ -1593,4 +1593,14 @@ static inline bool kvm_arch_has_irq_bypass(void)
> >>>>>>>>>           return true;
> >>>>>>>>>       }
> >>>>>>>>>
> >>>>>>>>> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> >>>>>>>>> +{
> >>>>>>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm)
> >>>>>>>>> +{
> >>>>>>>>> +     return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>       #endif /* __ARM64_KVM_HOST_H__ */
> >>>>>>>>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> >>>>>>>>> index 096e45acadb2..8c1e1964b46a 100644
> >>>>>>>>> --- a/arch/arm64/kvm/Kconfig
> >>>>>>>>> +++ b/arch/arm64/kvm/Kconfig
> >>>>>>>>> @@ -38,6 +38,7 @@ menuconfig KVM
> >>>>>>>>>           select HAVE_KVM_VCPU_RUN_PID_CHANGE
> >>>>>>>>>           select SCHED_INFO
> >>>>>>>>>           select GUEST_PERF_EVENTS if PERF_EVENTS
> >>>>>>>>> +     select KVM_GMEM_SHARED_MEM
> >>>>>>>>>           help
> >>>>>>>>>             Support hosting virtualized guest machines.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Do we have to reject somewhere if we are given a guest_memfd that was
> >>>>>>>> *not* created using the SHARED flag? Or will existing checks already
> >>>>>>>> reject that?
> >>>>>>>
> >>>>>>> We don't reject, but I don't think we need to. A user can create a
> >>>>>>> guest_memfd that's private in arm64, it would just be useless.
> >>>>>>
> >>>>>> But the arm64 fault routine would not be able to handle that properly, no?
> >>>>>
> >>>>> Actually it would. The function user_mem_abort() doesn't care whether
> >>>>> it's private or shared. It would fault it into the guest correctly
> >>>>> regardless.
> >>>>
> >>>>
> >>>> I think what I meant is that: if it's !shared (private only), shared
> >>>> accesses (IOW all access without CoCo) should be taken from the user
> >>>> space mapping.
> >>>>
> >>>> But user_mem_abort() would blindly go to kvm_gmem_get_pfn() because
> >>>> "is_gmem = kvm_slot_has_gmem(memslot) = true".
> >>>
> >>> Yes, since it is a gmem-backed slot.
> >>>
> >>>> In other words, arm64 would have to *ignore* guest_memfd that does not
> >>>> support shared?
> >>>>
> >>>> That's why I was wondering whether we should just immediately refuse
> >>>> such guest_memfds.
> >>>
> >>> My thinking is that if a user deliberately creates a
> >>> guest_memfd-backed slot without designating it as being sharable, then
> >>> either they would find out when they try to map that memory to the
> >>> host userspace (mapping it would fail), or it could be that they
> >>> deliberately want to set up a VM with memslots that not mappable at
> >>> all by the host.
> >>
> >> Hm. But that would meant that we interpret "private" memory as a concept
> >> that is not understood by the VM. Because the VM does not know what
> >> "private" memory is ...
> >>
> >>> Perhaps to add some layer of security (although a
> >>> very flimsy one, since it's not a confidential guest).
> >>
> >> Exactly my point. If you don't want to mmap it then ... don't mmap it :)
> >>
> >>>
> >>> I'm happy to a check to prevent this. The question is, how to do it
> >>> exactly (I assume it would be in kvm_gmem_create())? Would it be
> >>> arch-specific, i.e., prevent arm64 from creating non-shared
> >>> guest_memfd backed memslots? Or do it by VM type? Even if we do it by
> >>> VM-type it would need to be arch-specific, since we allow private
> >>> guest_memfd slots for the default VM in x86, but we wouldn't for
> >>> arm64.
> >>>
> >>> We could add another function, along the lines of
> >>> kvm_arch_supports_gmem_only_shared_mem(), but considering that it
> >>> actually works, and (arguably) would behave as intended, I'm not sure
> >>> if it's worth the complexity.
> >>>
> >>> What do you think?
> >>
> >> My thinking was to either block this at slot creation time or at
> >> guest_memfd creation time. And we should probably block that for other
> >> VM types as well that do not support private memory?
> >>
> >> I mean, creating guest_memfd for private memory when there is no concept
> >> of private memory for the VM is ... weird, no? :)
> >
> > Actually, I could add this as an arch-specific check in
> > arch/arm64/kvm/mmu.c:kvm_arch_prepare_memory_region(). That way, core
> > KVM/guest_memfd code doesn't need to handle this arm64-specific behavior.
> >
> > Does that sound good?
>
> Yes, but only do so if you agree.

Ack :)

/fuad

> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-05-21  7:48   ` David Hildenbrand
@ 2025-05-22  0:40     ` Ackerley Tng
  2025-05-22  7:16       ` David Hildenbrand
  0 siblings, 1 reply; 88+ messages in thread
From: Ackerley Tng @ 2025-05-22  0:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

David Hildenbrand <david@redhat.com> writes:

> On 13.05.25 18:34, Fuad Tabba wrote:
>> From: Ackerley Tng <ackerleytng@google.com>
>>
>> For memslots backed by guest_memfd with shared mem support, the KVM MMU
>> always faults-in pages from guest_memfd, and not from the userspace_addr.
>> Towards this end, this patch also introduces a new guest_memfd flag,
>> GUEST_MEMFD_FLAG_SUPPORT_SHARED, which indicates that the guest_memfd
>> instance supports in-place shared memory.
>>
>> This flag is only supported if the VM creating the guest_memfd instance
>> belongs to certain types determined by architecture. Only non-CoCo VMs
>> are permitted to use guest_memfd with shared mem, for now.
>>
>> Function names have also been updated for accuracy -
>> kvm_mem_is_private() returns true only when the current private/shared
>> state (in the CoCo sense) of the memory is private, and returns false if
>> the current state is shared explicitly or impicitly, e.g., belongs to a
>> non-CoCo VM.
>>
>> kvm_mmu_faultin_pfn_gmem() is updated to indicate that it can be used
>> to fault in not just private memory, but more generally, from
>> guest_memfd.
>>
>> Co-developed-by: Fuad Tabba <tabba@google.com>
>> Signed-off-by: Fuad Tabba <tabba@google.com>
>> Co-developed-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>> ---
>
>
> [...]
>
>> +
>>   #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
>>   static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
>>   {
>> @@ -2515,10 +2524,30 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
>>   bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>>   					 struct kvm_gfn_range *range);
>>
>> +/*
>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
>> + * private.
>> + *
>> + * A return value of false indicates that the gfn is explicitly or implicity
>
> s/implicity/implicitly/
>

Thanks!

>> + * shared (i.e., non-CoCo VMs).
>> + */
>>   static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>   {
>> -	return IS_ENABLED(CONFIG_KVM_GMEM) &&
>> -	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>> +	struct kvm_memory_slot *slot;
>> +
>> +	if (!IS_ENABLED(CONFIG_KVM_GMEM))
>> +		return false;
>> +
>> +	slot = gfn_to_memslot(kvm, gfn);
>> +	if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>> +		/*
>> +		 * For now, memslots only support in-place shared memory if the
>> +		 * host is allowed to mmap memory (i.e., non-Coco VMs).
>> +		 */
>
> Not accurate: there is no in-place conversion support in this series,
> because there is no such itnerface. So the reason is that all memory is
> shared for there VM types?
>

True that there's no in-place conversion yet.

In this patch series, guest_memfd memslots support shared memory only
for specific VM types (on x86, that would be KVM_X86_DEFAULT_VM and
KVM_X86_SW_PROTECTED_VMs).

How about this wording:

Without conversion support, if the guest_memfd memslot supports shared
memory, all memory must be used as not private (implicitly shared).

>> +		return false;
>> +	}
>> +
>> +	return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>   }
>>   #else
>>   static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 2f499021df66..fe0245335c96 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -388,6 +388,23 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>>
>>   	return 0;
>>   }
>> +
>> +bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot)
>> +{
>> +	struct file *file;
>> +	bool ret;
>> +
>> +	file = kvm_gmem_get_file((struct kvm_memory_slot *)slot);
>> +	if (!file)
>> +		return false;
>> +
>> +	ret = kvm_gmem_supports_shared(file_inode(file));
>> +
>> +	fput(file);
>> +	return ret;
>
> Would it make sense to cache that information in the memslot, to avoid
> the get/put?
>
> We could simply cache when creating the memslot I guess.
>

When I wrote it I was assuming that to ensure correctness we should
check with guest memfd, like what if someone closed the gmem file in the
middle of the fault path?

But I guess after the discussion at the last call, since the faulting
process is long and racy, if this check passed and we go to guest memfd
and the file was closed, it would just fail so I guess caching is fine.

> As an alternative ... could we simple get/put when managing the memslot?

What does a simple get/put mean here?


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-21  8:01   ` David Hildenbrand
@ 2025-05-22  0:45     ` Ackerley Tng
  2025-05-22 13:22       ` Sean Christopherson
  2025-05-22  7:22     ` Fuad Tabba
  1 sibling, 1 reply; 88+ messages in thread
From: Ackerley Tng @ 2025-05-22  0:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

David Hildenbrand <david@redhat.com> writes:

> On 13.05.25 18:34, Fuad Tabba wrote:
>> From: Ackerley Tng <ackerleytng@google.com>
>> 
>> This patch adds kvm_gmem_max_mapping_level(), which always returns
>> PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
>> 
>> When guest_memfd supports shared memory, max_mapping_level (especially
>> when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
>> from recover_huge_pages_range()) should take input from
>> guest_memfd.
>> 
>> Input from guest_memfd should be taken in these cases:
>> 
>> + if the memslot supports shared memory (guest_memfd is used for
>>    shared memory, or in future both shared and private memory) or
>> + if the memslot is only used for private memory and that gfn is
>>    private.
>> 
>> If the memslot doesn't use guest_memfd, figure out the
>> max_mapping_level using the host page tables like before.
>> 
>> This patch also refactors and inlines the other call to
>> __kvm_mmu_max_mapping_level().
>> 
>> In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
>> provided (if applicable) in fault->max_level. Hence, there is no need
>> to query guest_memfd.
>> 
>> lpage_info is queried like before, and then if the fault is not from
>> guest_memfd, adjust fault->req_level based on input from host page
>> tables.
>> 
>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>> Signed-off-by: Fuad Tabba <tabba@google.com>
>> ---
>>   arch/x86/kvm/mmu/mmu.c   | 92 ++++++++++++++++++++++++++--------------
>>   include/linux/kvm_host.h |  7 +++
>>   virt/kvm/guest_memfd.c   | 12 ++++++
>>   3 files changed, 79 insertions(+), 32 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index cfbb471f7c70..9e0bc8114859 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
>>   	return level;
>>   }
> [...]
>
>>   static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
>>   					    struct kvm_page_fault *fault,
>>   					    int order)
>> @@ -4523,7 +4551,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>>   {
>>   	unsigned int foll = fault->write ? FOLL_WRITE : 0;
>>   
>> -	if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
>> +	if (fault_from_gmem(fault))
>
> Should this change rather have been done in the previous patch?
>
> (then only adjust fault_from_gmem() in this function as required)
>

Yes, that is a good idea, thanks!

>>   		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>>   
>>   	foll |= FOLL_NOWAIT;
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index de7b46ee1762..f9bb025327c3 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>   int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>>   		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
>>   		     int *max_order);
>> +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
>>   #else
>>   static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>>   				   struct kvm_memory_slot *slot, gfn_t gfn,
>> @@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>>   	KVM_BUG_ON(1, kvm);
>>   	return -EIO;
>>   }
>> +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
>> +					 gfn_t gfn)
>
> Probably should indent with two tabs here.

Yup!


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-05-22  0:40     ` Ackerley Tng
@ 2025-05-22  7:16       ` David Hildenbrand
  2025-05-22  7:46         ` Fuad Tabba
  0 siblings, 1 reply; 88+ messages in thread
From: David Hildenbrand @ 2025-05-22  7:16 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny


>>> + * shared (i.e., non-CoCo VMs).
>>> + */
>>>    static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>>    {
>>> -	return IS_ENABLED(CONFIG_KVM_GMEM) &&
>>> -	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>> +	struct kvm_memory_slot *slot;
>>> +
>>> +	if (!IS_ENABLED(CONFIG_KVM_GMEM))
>>> +		return false;
>>> +
>>> +	slot = gfn_to_memslot(kvm, gfn);
>>> +	if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>>> +		/*
>>> +		 * For now, memslots only support in-place shared memory if the
>>> +		 * host is allowed to mmap memory (i.e., non-Coco VMs).
>>> +		 */
>>
>> Not accurate: there is no in-place conversion support in this series,
>> because there is no such itnerface. So the reason is that all memory is
>> shared for there VM types?
>>
> 
> True that there's no in-place conversion yet.
> 
> In this patch series, guest_memfd memslots support shared memory only
> for specific VM types (on x86, that would be KVM_X86_DEFAULT_VM and
> KVM_X86_SW_PROTECTED_VMs).
> 
> How about this wording:
> 
> Without conversion support, if the guest_memfd memslot supports shared
> memory, all memory must be used as not private (implicitly shared).
> 

LGTM

>>> +		return false;
>>> +	}
>>> +
>>> +	return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>>    }
>>>    #else
>>>    static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>>> index 2f499021df66..fe0245335c96 100644
>>> --- a/virt/kvm/guest_memfd.c
>>> +++ b/virt/kvm/guest_memfd.c
>>> @@ -388,6 +388,23 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>>>
>>>    	return 0;
>>>    }
>>> +
>>> +bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot)
>>> +{
>>> +	struct file *file;
>>> +	bool ret;
>>> +
>>> +	file = kvm_gmem_get_file((struct kvm_memory_slot *)slot);
>>> +	if (!file)
>>> +		return false;
>>> +
>>> +	ret = kvm_gmem_supports_shared(file_inode(file));
>>> +
>>> +	fput(file);
>>> +	return ret;
>>
>> Would it make sense to cache that information in the memslot, to avoid
>> the get/put?
>>
>> We could simply cache when creating the memslot I guess.
>>
> 
> When I wrote it I was assuming that to ensure correctness we should
> check with guest memfd, like what if someone closed the gmem file in the
> middle of the fault path?
> 
> But I guess after the discussion at the last call, since the faulting
> process is long and racy, if this check passed and we go to guest memfd
> and the file was closed, it would just fail so I guess caching is fine.

Yes, that would be my assumption. I mean, we also msut make sure that if 
the user does something stupid like that, that we won't trigger other 
undesired code paths (like, suddenly the guest_memfd being !shared).

> 
>> As an alternative ... could we simple get/put when managing the memslot?
> 
> What does a simple get/put mean here?

s/simple/simply/

So when we create the memslot, we'd perform the get, and when we destroy 
the memslot, we'd do the put.

Just an idea.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-21  8:01   ` David Hildenbrand
  2025-05-22  0:45     ` Ackerley Tng
@ 2025-05-22  7:22     ` Fuad Tabba
  2025-05-22  8:56       ` David Hildenbrand
  1 sibling, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-22  7:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi David,

On Wed, 21 May 2025 at 09:01, David Hildenbrand <david@redhat.com> wrote:
>
> On 13.05.25 18:34, Fuad Tabba wrote:
> > From: Ackerley Tng <ackerleytng@google.com>
> >
> > This patch adds kvm_gmem_max_mapping_level(), which always returns
> > PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
> >
> > When guest_memfd supports shared memory, max_mapping_level (especially
> > when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
> > from recover_huge_pages_range()) should take input from
> > guest_memfd.
> >
> > Input from guest_memfd should be taken in these cases:
> >
> > + if the memslot supports shared memory (guest_memfd is used for
> >    shared memory, or in future both shared and private memory) or
> > + if the memslot is only used for private memory and that gfn is
> >    private.
> >
> > If the memslot doesn't use guest_memfd, figure out the
> > max_mapping_level using the host page tables like before.
> >
> > This patch also refactors and inlines the other call to
> > __kvm_mmu_max_mapping_level().
> >
> > In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
> > provided (if applicable) in fault->max_level. Hence, there is no need
> > to query guest_memfd.
> >
> > lpage_info is queried like before, and then if the fault is not from
> > guest_memfd, adjust fault->req_level based on input from host page
> > tables.
> >
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   arch/x86/kvm/mmu/mmu.c   | 92 ++++++++++++++++++++++++++--------------
> >   include/linux/kvm_host.h |  7 +++
> >   virt/kvm/guest_memfd.c   | 12 ++++++
> >   3 files changed, 79 insertions(+), 32 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index cfbb471f7c70..9e0bc8114859 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
> >       return level;
> >   }
> [...]
>
> >   static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
> >                                           struct kvm_page_fault *fault,
> >                                           int order)
> > @@ -4523,7 +4551,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> >   {
> >       unsigned int foll = fault->write ? FOLL_WRITE : 0;
> >
> > -     if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
> > +     if (fault_from_gmem(fault))
>
> Should this change rather have been done in the previous patch?
>
> (then only adjust fault_from_gmem() in this function as required)
>
> >               return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
> >
> >       foll |= FOLL_NOWAIT;
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index de7b46ee1762..f9bb025327c3 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >   int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >                    gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
> >                    int *max_order);
> > +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
> >   #else
> >   static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >                                  struct kvm_memory_slot *slot, gfn_t gfn,
> > @@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >       KVM_BUG_ON(1, kvm);
> >       return -EIO;
> >   }
> > +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
> > +                                      gfn_t gfn)
>
> Probably should indent with two tabs here.

(I'm fixing the patch before respinning, hence it's me asking)

Not sure I understand. Indentation here matches the same style as that
for kvm_gmem_get_pfn() right above it in the alignment of the
parameters, i.e., the parameter `gfn_t gfn` is aligned with the
parameter `const struct kvm_memory_slot *slot` (four tabs and a
space).

Thanks,
/fuad


>
>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-05-22  7:16       ` David Hildenbrand
@ 2025-05-22  7:46         ` Fuad Tabba
  2025-05-22  8:14           ` David Hildenbrand
  0 siblings, 1 reply; 88+ messages in thread
From: Fuad Tabba @ 2025-05-22  7:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Ackerley Tng, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, seanjc, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi David,

On Thu, 22 May 2025 at 08:16, David Hildenbrand <david@redhat.com> wrote:
>
>
> >>> + * shared (i.e., non-CoCo VMs).
> >>> + */
> >>>    static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >>>    {
> >>> -   return IS_ENABLED(CONFIG_KVM_GMEM) &&
> >>> -          kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> >>> +   struct kvm_memory_slot *slot;
> >>> +
> >>> +   if (!IS_ENABLED(CONFIG_KVM_GMEM))
> >>> +           return false;
> >>> +
> >>> +   slot = gfn_to_memslot(kvm, gfn);
> >>> +   if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
> >>> +           /*
> >>> +            * For now, memslots only support in-place shared memory if the
> >>> +            * host is allowed to mmap memory (i.e., non-Coco VMs).
> >>> +            */
> >>
> >> Not accurate: there is no in-place conversion support in this series,
> >> because there is no such itnerface. So the reason is that all memory is
> >> shared for there VM types?
> >>
> >
> > True that there's no in-place conversion yet.
> >
> > In this patch series, guest_memfd memslots support shared memory only
> > for specific VM types (on x86, that would be KVM_X86_DEFAULT_VM and
> > KVM_X86_SW_PROTECTED_VMs).
> >
> > How about this wording:
> >
> > Without conversion support, if the guest_memfd memslot supports shared
> > memory, all memory must be used as not private (implicitly shared).
> >
>
> LGTM
>
> >>> +           return false;
> >>> +   }
> >>> +
> >>> +   return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> >>>    }
> >>>    #else
> >>>    static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> >>> index 2f499021df66..fe0245335c96 100644
> >>> --- a/virt/kvm/guest_memfd.c
> >>> +++ b/virt/kvm/guest_memfd.c
> >>> @@ -388,6 +388,23 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> >>>
> >>>     return 0;
> >>>    }
> >>> +
> >>> +bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot)
> >>> +{
> >>> +   struct file *file;
> >>> +   bool ret;
> >>> +
> >>> +   file = kvm_gmem_get_file((struct kvm_memory_slot *)slot);
> >>> +   if (!file)
> >>> +           return false;
> >>> +
> >>> +   ret = kvm_gmem_supports_shared(file_inode(file));
> >>> +
> >>> +   fput(file);
> >>> +   return ret;
> >>
> >> Would it make sense to cache that information in the memslot, to avoid
> >> the get/put?
> >>
> >> We could simply cache when creating the memslot I guess.
> >>
> >
> > When I wrote it I was assuming that to ensure correctness we should
> > check with guest memfd, like what if someone closed the gmem file in the
> > middle of the fault path?
> >
> > But I guess after the discussion at the last call, since the faulting
> > process is long and racy, if this check passed and we go to guest memfd
> > and the file was closed, it would just fail so I guess caching is fine.
>
> Yes, that would be my assumption. I mean, we also msut make sure that if
> the user does something stupid like that, that we won't trigger other
> undesired code paths (like, suddenly the guest_memfd being !shared).
>
> >
> >> As an alternative ... could we simple get/put when managing the memslot?
> >
> > What does a simple get/put mean here?
>
> s/simple/simply/
>
> So when we create the memslot, we'd perform the get, and when we destroy
> the memslot, we'd do the put.
>
> Just an idea.

I'm not sure we can do that. The comment in kvm_gmem_bind() on
dropping the reference to the file explains why:
https://elixir.bootlin.com/linux/v6.14.7/source/virt/kvm/guest_memfd.c#L526

I think the best thing is to track whether a slot supports shared
memory inside struct kvm_memory_slot::struct gmem.

Thanks,
/fuad



> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-05-22  7:46         ` Fuad Tabba
@ 2025-05-22  8:14           ` David Hildenbrand
  2025-05-22 10:24             ` Fuad Tabba
  0 siblings, 1 reply; 88+ messages in thread
From: David Hildenbrand @ 2025-05-22  8:14 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Ackerley Tng, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, seanjc, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 22.05.25 09:46, Fuad Tabba wrote:
> Hi David,
> 
> On Thu, 22 May 2025 at 08:16, David Hildenbrand <david@redhat.com> wrote:
>>
>>
>>>>> + * shared (i.e., non-CoCo VMs).
>>>>> + */
>>>>>     static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>>>>     {
>>>>> -   return IS_ENABLED(CONFIG_KVM_GMEM) &&
>>>>> -          kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>>>> +   struct kvm_memory_slot *slot;
>>>>> +
>>>>> +   if (!IS_ENABLED(CONFIG_KVM_GMEM))
>>>>> +           return false;
>>>>> +
>>>>> +   slot = gfn_to_memslot(kvm, gfn);
>>>>> +   if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>>>>> +           /*
>>>>> +            * For now, memslots only support in-place shared memory if the
>>>>> +            * host is allowed to mmap memory (i.e., non-Coco VMs).
>>>>> +            */
>>>>
>>>> Not accurate: there is no in-place conversion support in this series,
>>>> because there is no such itnerface. So the reason is that all memory is
>>>> shared for there VM types?
>>>>
>>>
>>> True that there's no in-place conversion yet.
>>>
>>> In this patch series, guest_memfd memslots support shared memory only
>>> for specific VM types (on x86, that would be KVM_X86_DEFAULT_VM and
>>> KVM_X86_SW_PROTECTED_VMs).
>>>
>>> How about this wording:
>>>
>>> Without conversion support, if the guest_memfd memslot supports shared
>>> memory, all memory must be used as not private (implicitly shared).
>>>
>>
>> LGTM
>>
>>>>> +           return false;
>>>>> +   }
>>>>> +
>>>>> +   return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>>>>     }
>>>>>     #else
>>>>>     static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>>>>> index 2f499021df66..fe0245335c96 100644
>>>>> --- a/virt/kvm/guest_memfd.c
>>>>> +++ b/virt/kvm/guest_memfd.c
>>>>> @@ -388,6 +388,23 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>>>>>
>>>>>      return 0;
>>>>>     }
>>>>> +
>>>>> +bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot)
>>>>> +{
>>>>> +   struct file *file;
>>>>> +   bool ret;
>>>>> +
>>>>> +   file = kvm_gmem_get_file((struct kvm_memory_slot *)slot);
>>>>> +   if (!file)
>>>>> +           return false;
>>>>> +
>>>>> +   ret = kvm_gmem_supports_shared(file_inode(file));
>>>>> +
>>>>> +   fput(file);
>>>>> +   return ret;
>>>>
>>>> Would it make sense to cache that information in the memslot, to avoid
>>>> the get/put?
>>>>
>>>> We could simply cache when creating the memslot I guess.
>>>>
>>>
>>> When I wrote it I was assuming that to ensure correctness we should
>>> check with guest memfd, like what if someone closed the gmem file in the
>>> middle of the fault path?
>>>
>>> But I guess after the discussion at the last call, since the faulting
>>> process is long and racy, if this check passed and we go to guest memfd
>>> and the file was closed, it would just fail so I guess caching is fine.
>>
>> Yes, that would be my assumption. I mean, we also msut make sure that if
>> the user does something stupid like that, that we won't trigger other
>> undesired code paths (like, suddenly the guest_memfd being !shared).
>>
>>>
>>>> As an alternative ... could we simple get/put when managing the memslot?
>>>
>>> What does a simple get/put mean here?
>>
>> s/simple/simply/
>>
>> So when we create the memslot, we'd perform the get, and when we destroy
>> the memslot, we'd do the put.
>>
>> Just an idea.
> 
> I'm not sure we can do that. The comment in kvm_gmem_bind() on
> dropping the reference to the file explains why:
> https://elixir.bootlin.com/linux/v6.14.7/source/virt/kvm/guest_memfd.c#L526

Right, although it is rather suboptimal; we have to constantly get/put 
the file, even in kvm_gmem_get_pfn() right now.

Repeatedly two atomics and a bunch of checks ... for something a sane 
use case should never trigger.

Anyhow, that's probably something to optimize also for 
kvm_gmem_get_pfn() later on? Of course, the caching here is rather 
straight forward.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-22  7:22     ` Fuad Tabba
@ 2025-05-22  8:56       ` David Hildenbrand
  2025-05-22  9:34         ` Fuad Tabba
  0 siblings, 1 reply; 88+ messages in thread
From: David Hildenbrand @ 2025-05-22  8:56 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 22.05.25 09:22, Fuad Tabba wrote:
> Hi David,
> 
> On Wed, 21 May 2025 at 09:01, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 13.05.25 18:34, Fuad Tabba wrote:
>>> From: Ackerley Tng <ackerleytng@google.com>
>>>
>>> This patch adds kvm_gmem_max_mapping_level(), which always returns
>>> PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
>>>
>>> When guest_memfd supports shared memory, max_mapping_level (especially
>>> when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
>>> from recover_huge_pages_range()) should take input from
>>> guest_memfd.
>>>
>>> Input from guest_memfd should be taken in these cases:
>>>
>>> + if the memslot supports shared memory (guest_memfd is used for
>>>     shared memory, or in future both shared and private memory) or
>>> + if the memslot is only used for private memory and that gfn is
>>>     private.
>>>
>>> If the memslot doesn't use guest_memfd, figure out the
>>> max_mapping_level using the host page tables like before.
>>>
>>> This patch also refactors and inlines the other call to
>>> __kvm_mmu_max_mapping_level().
>>>
>>> In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
>>> provided (if applicable) in fault->max_level. Hence, there is no need
>>> to query guest_memfd.
>>>
>>> lpage_info is queried like before, and then if the fault is not from
>>> guest_memfd, adjust fault->req_level based on input from host page
>>> tables.
>>>
>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>>    arch/x86/kvm/mmu/mmu.c   | 92 ++++++++++++++++++++++++++--------------
>>>    include/linux/kvm_host.h |  7 +++
>>>    virt/kvm/guest_memfd.c   | 12 ++++++
>>>    3 files changed, 79 insertions(+), 32 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>> index cfbb471f7c70..9e0bc8114859 100644
>>> --- a/arch/x86/kvm/mmu/mmu.c
>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>> @@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
>>>        return level;
>>>    }
>> [...]
>>
>>>    static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
>>>                                            struct kvm_page_fault *fault,
>>>                                            int order)
>>> @@ -4523,7 +4551,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>>>    {
>>>        unsigned int foll = fault->write ? FOLL_WRITE : 0;
>>>
>>> -     if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
>>> +     if (fault_from_gmem(fault))
>>
>> Should this change rather have been done in the previous patch?
>>
>> (then only adjust fault_from_gmem() in this function as required)
>>
>>>                return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>>>
>>>        foll |= FOLL_NOWAIT;
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index de7b46ee1762..f9bb025327c3 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>>    int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>>>                     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
>>>                     int *max_order);
>>> +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
>>>    #else
>>>    static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>>>                                   struct kvm_memory_slot *slot, gfn_t gfn,
>>> @@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>>>        KVM_BUG_ON(1, kvm);
>>>        return -EIO;
>>>    }
>>> +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
>>> +                                      gfn_t gfn)
>>
>> Probably should indent with two tabs here.
> 
> (I'm fixing the patch before respinning, hence it's me asking)
> 
> Not sure I understand. Indentation here matches the same style as that
> for kvm_gmem_get_pfn() right above it in the alignment of the
> parameters, i.e., the parameter `gfn_t gfn` is aligned with the
> parameter `const struct kvm_memory_slot *slot` (four tabs and a
> space).

Yeah, that way of indenting is rather bad practice. Especially for new 
code we're adding or when we touch existing code, we should just use two
tabs.

That way, we can fit more stuff into a single line, and when doing
simple changes, such as renaming the function or changing the return
type, we won't have to touch all the parameters.

Maybe KVM has its own rules on that ... that's why I said "probably" :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-22  8:56       ` David Hildenbrand
@ 2025-05-22  9:34         ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-22  9:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, 22 May 2025 at 09:56, David Hildenbrand <david@redhat.com> wrote:
>
> On 22.05.25 09:22, Fuad Tabba wrote:
> > Hi David,
> >
> > On Wed, 21 May 2025 at 09:01, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 13.05.25 18:34, Fuad Tabba wrote:
> >>> From: Ackerley Tng <ackerleytng@google.com>
> >>>
> >>> This patch adds kvm_gmem_max_mapping_level(), which always returns
> >>> PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
> >>>
> >>> When guest_memfd supports shared memory, max_mapping_level (especially
> >>> when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
> >>> from recover_huge_pages_range()) should take input from
> >>> guest_memfd.
> >>>
> >>> Input from guest_memfd should be taken in these cases:
> >>>
> >>> + if the memslot supports shared memory (guest_memfd is used for
> >>>     shared memory, or in future both shared and private memory) or
> >>> + if the memslot is only used for private memory and that gfn is
> >>>     private.
> >>>
> >>> If the memslot doesn't use guest_memfd, figure out the
> >>> max_mapping_level using the host page tables like before.
> >>>
> >>> This patch also refactors and inlines the other call to
> >>> __kvm_mmu_max_mapping_level().
> >>>
> >>> In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
> >>> provided (if applicable) in fault->max_level. Hence, there is no need
> >>> to query guest_memfd.
> >>>
> >>> lpage_info is queried like before, and then if the fault is not from
> >>> guest_memfd, adjust fault->req_level based on input from host page
> >>> tables.
> >>>
> >>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> >>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>> ---
> >>>    arch/x86/kvm/mmu/mmu.c   | 92 ++++++++++++++++++++++++++--------------
> >>>    include/linux/kvm_host.h |  7 +++
> >>>    virt/kvm/guest_memfd.c   | 12 ++++++
> >>>    3 files changed, 79 insertions(+), 32 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> >>> index cfbb471f7c70..9e0bc8114859 100644
> >>> --- a/arch/x86/kvm/mmu/mmu.c
> >>> +++ b/arch/x86/kvm/mmu/mmu.c
> >>> @@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
> >>>        return level;
> >>>    }
> >> [...]
> >>
> >>>    static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
> >>>                                            struct kvm_page_fault *fault,
> >>>                                            int order)
> >>> @@ -4523,7 +4551,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> >>>    {
> >>>        unsigned int foll = fault->write ? FOLL_WRITE : 0;
> >>>
> >>> -     if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
> >>> +     if (fault_from_gmem(fault))
> >>
> >> Should this change rather have been done in the previous patch?
> >>
> >> (then only adjust fault_from_gmem() in this function as required)
> >>
> >>>                return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
> >>>
> >>>        foll |= FOLL_NOWAIT;
> >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >>> index de7b46ee1762..f9bb025327c3 100644
> >>> --- a/include/linux/kvm_host.h
> >>> +++ b/include/linux/kvm_host.h
> >>> @@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >>>    int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >>>                     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
> >>>                     int *max_order);
> >>> +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
> >>>    #else
> >>>    static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >>>                                   struct kvm_memory_slot *slot, gfn_t gfn,
> >>> @@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >>>        KVM_BUG_ON(1, kvm);
> >>>        return -EIO;
> >>>    }
> >>> +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
> >>> +                                      gfn_t gfn)
> >>
> >> Probably should indent with two tabs here.
> >
> > (I'm fixing the patch before respinning, hence it's me asking)
> >
> > Not sure I understand. Indentation here matches the same style as that
> > for kvm_gmem_get_pfn() right above it in the alignment of the
> > parameters, i.e., the parameter `gfn_t gfn` is aligned with the
> > parameter `const struct kvm_memory_slot *slot` (four tabs and a
> > space).
>
> Yeah, that way of indenting is rather bad practice. Especially for new
> code we're adding or when we touch existing code, we should just use two
> tabs.
>
> That way, we can fit more stuff into a single line, and when doing
> simple changes, such as renaming the function or changing the return
> type, we won't have to touch all the parameters.
>
> Maybe KVM has its own rules on that ... that's why I said "probably" :)

:)

I see, although I agree with you, I'd rather that indentation be
consistent within the same file.

Thanks,
/fuad
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-05-22  8:14           ` David Hildenbrand
@ 2025-05-22 10:24             ` Fuad Tabba
  0 siblings, 0 replies; 88+ messages in thread
From: Fuad Tabba @ 2025-05-22 10:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Ackerley Tng, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, seanjc, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, 22 May 2025 at 09:15, David Hildenbrand <david@redhat.com> wrote:
>
> On 22.05.25 09:46, Fuad Tabba wrote:
> > Hi David,
> >
> > On Thu, 22 May 2025 at 08:16, David Hildenbrand <david@redhat.com> wrote:
> >>
> >>
> >>>>> + * shared (i.e., non-CoCo VMs).
> >>>>> + */
> >>>>>     static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >>>>>     {
> >>>>> -   return IS_ENABLED(CONFIG_KVM_GMEM) &&
> >>>>> -          kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> >>>>> +   struct kvm_memory_slot *slot;
> >>>>> +
> >>>>> +   if (!IS_ENABLED(CONFIG_KVM_GMEM))
> >>>>> +           return false;
> >>>>> +
> >>>>> +   slot = gfn_to_memslot(kvm, gfn);
> >>>>> +   if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
> >>>>> +           /*
> >>>>> +            * For now, memslots only support in-place shared memory if the
> >>>>> +            * host is allowed to mmap memory (i.e., non-Coco VMs).
> >>>>> +            */
> >>>>
> >>>> Not accurate: there is no in-place conversion support in this series,
> >>>> because there is no such itnerface. So the reason is that all memory is
> >>>> shared for there VM types?
> >>>>
> >>>
> >>> True that there's no in-place conversion yet.
> >>>
> >>> In this patch series, guest_memfd memslots support shared memory only
> >>> for specific VM types (on x86, that would be KVM_X86_DEFAULT_VM and
> >>> KVM_X86_SW_PROTECTED_VMs).
> >>>
> >>> How about this wording:
> >>>
> >>> Without conversion support, if the guest_memfd memslot supports shared
> >>> memory, all memory must be used as not private (implicitly shared).
> >>>
> >>
> >> LGTM
> >>
> >>>>> +           return false;
> >>>>> +   }
> >>>>> +
> >>>>> +   return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> >>>>>     }
> >>>>>     #else
> >>>>>     static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >>>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> >>>>> index 2f499021df66..fe0245335c96 100644
> >>>>> --- a/virt/kvm/guest_memfd.c
> >>>>> +++ b/virt/kvm/guest_memfd.c
> >>>>> @@ -388,6 +388,23 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> >>>>>
> >>>>>      return 0;
> >>>>>     }
> >>>>> +
> >>>>> +bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot)
> >>>>> +{
> >>>>> +   struct file *file;
> >>>>> +   bool ret;
> >>>>> +
> >>>>> +   file = kvm_gmem_get_file((struct kvm_memory_slot *)slot);
> >>>>> +   if (!file)
> >>>>> +           return false;
> >>>>> +
> >>>>> +   ret = kvm_gmem_supports_shared(file_inode(file));
> >>>>> +
> >>>>> +   fput(file);
> >>>>> +   return ret;
> >>>>
> >>>> Would it make sense to cache that information in the memslot, to avoid
> >>>> the get/put?
> >>>>
> >>>> We could simply cache when creating the memslot I guess.
> >>>>
> >>>
> >>> When I wrote it I was assuming that to ensure correctness we should
> >>> check with guest memfd, like what if someone closed the gmem file in the
> >>> middle of the fault path?
> >>>
> >>> But I guess after the discussion at the last call, since the faulting
> >>> process is long and racy, if this check passed and we go to guest memfd
> >>> and the file was closed, it would just fail so I guess caching is fine.
> >>
> >> Yes, that would be my assumption. I mean, we also msut make sure that if
> >> the user does something stupid like that, that we won't trigger other
> >> undesired code paths (like, suddenly the guest_memfd being !shared).
> >>
> >>>
> >>>> As an alternative ... could we simple get/put when managing the memslot?
> >>>
> >>> What does a simple get/put mean here?
> >>
> >> s/simple/simply/
> >>
> >> So when we create the memslot, we'd perform the get, and when we destroy
> >> the memslot, we'd do the put.
> >>
> >> Just an idea.
> >
> > I'm not sure we can do that. The comment in kvm_gmem_bind() on
> > dropping the reference to the file explains why:
> > https://elixir.bootlin.com/linux/v6.14.7/source/virt/kvm/guest_memfd.c#L526
>
> Right, although it is rather suboptimal; we have to constantly get/put
> the file, even in kvm_gmem_get_pfn() right now.
>
> Repeatedly two atomics and a bunch of checks ... for something a sane
> use case should never trigger.
>
> Anyhow, that's probably something to optimize also for
> kvm_gmem_get_pfn() later on? Of course, the caching here is rather
> straight forward.

Done.

Thanks,
/fuad

> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-22  0:45     ` Ackerley Tng
@ 2025-05-22 13:22       ` Sean Christopherson
  2025-05-22 13:49         ` David Hildenbrand
  0 siblings, 1 reply; 88+ messages in thread
From: Sean Christopherson @ 2025-05-22 13:22 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: David Hildenbrand, tabba, kvm, linux-arm-msm, linux-mm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Wed, May 21, 2025, Ackerley Tng wrote:
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index de7b46ee1762..f9bb025327c3 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >>   int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >>   		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
> >>   		     int *max_order);
> >> +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
> >>   #else
> >>   static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >>   				   struct kvm_memory_slot *slot, gfn_t gfn,
> >> @@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >>   	KVM_BUG_ON(1, kvm);
> >>   	return -EIO;
> >>   }
> >> +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
> >> +					 gfn_t gfn)
> >
> > Probably should indent with two tabs here.
> 
> Yup!

Nope!  :-)

In KVM, please align the indentation as you did.

 : Yeah, that way of indenting is rather bad practice. Especially for new
 : code we're adding or when we touch existing code, we should just use two
 : tabs.

 : That way, we can fit more stuff into a single line, and when doing
 : simple changes, such as renaming the function or changing the return
 : type, we won't have to touch all the parameters.

At the cost of readability, IMO.  The number of eyeballs that read the code is
orders of magnitude greater than the number of times a function's parameters end
up being shuffled around.  Sacrificing readability and consistenty to avoid a
small amount of rare churn isn't a good tradeoff.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd
  2025-05-22 13:22       ` Sean Christopherson
@ 2025-05-22 13:49         ` David Hildenbrand
  0 siblings, 0 replies; 88+ messages in thread
From: David Hildenbrand @ 2025-05-22 13:49 UTC (permalink / raw)
  To: Sean Christopherson, Ackerley Tng
  Cc: tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 22.05.25 15:22, Sean Christopherson wrote:
> On Wed, May 21, 2025, Ackerley Tng wrote:
>>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>>> index de7b46ee1762..f9bb025327c3 100644
>>>> --- a/include/linux/kvm_host.h
>>>> +++ b/include/linux/kvm_host.h
>>>> @@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>>>    int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>>>>    		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
>>>>    		     int *max_order);
>>>> +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
>>>>    #else
>>>>    static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>>>>    				   struct kvm_memory_slot *slot, gfn_t gfn,
>>>> @@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>>>>    	KVM_BUG_ON(1, kvm);
>>>>    	return -EIO;
>>>>    }
>>>> +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
>>>> +					 gfn_t gfn)
>>>
>>> Probably should indent with two tabs here.
>>
>> Yup!
> 
> Nope!  :-)
> 
> In KVM, please align the indentation as you did.
> 
>   : Yeah, that way of indenting is rather bad practice. Especially for new
>   : code we're adding or when we touch existing code, we should just use two
>   : tabs.
> 
>   : That way, we can fit more stuff into a single line, and when doing
>   : simple changes, such as renaming the function or changing the return
>   : type, we won't have to touch all the parameters.
> 
> At the cost of readability, IMO.  The number of eyeballs that read the code is
> orders of magnitude greater than the number of times a function's parameters end
> up being shuffled around.  Sacrificing readability and consistenty to avoid a
> small amount of rare churn isn't a good tradeoff.

I new KVM wanted to be weird! :P

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2025-05-22 13:49 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13 16:34 [PATCH v9 00/17] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-05-13 16:34 ` [PATCH v9 01/17] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-05-21  7:14   ` Gavin Shan
2025-05-13 16:34 ` [PATCH v9 02/17] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-05-13 21:56   ` Ira Weiny
2025-05-21  7:14   ` Gavin Shan
2025-05-13 16:34 ` [PATCH v9 03/17] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-05-21  7:15   ` Gavin Shan
2025-05-13 16:34 ` [PATCH v9 04/17] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-05-21  7:15   ` Gavin Shan
2025-05-13 16:34 ` [PATCH v9 05/17] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-05-21  7:16   ` Gavin Shan
2025-05-13 16:34 ` [PATCH v9 06/17] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-05-21  7:16   ` Gavin Shan
2025-05-13 16:34 ` [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
2025-05-13 18:37   ` Ackerley Tng
2025-05-16 19:21     ` James Houghton
2025-05-18 15:17       ` Fuad Tabba
2025-05-21  7:36         ` David Hildenbrand
2025-05-14  8:03   ` Shivank Garg
2025-05-14  9:45     ` Fuad Tabba
2025-05-14 10:07   ` Roy, Patrick
2025-05-14 11:30     ` Fuad Tabba
2025-05-14 20:40   ` James Houghton
2025-05-15  7:25     ` Fuad Tabba
2025-05-15 23:42   ` Gavin Shan
2025-05-16  7:31     ` Fuad Tabba
2025-05-16  6:08   ` Gavin Shan
2025-05-16  7:56     ` Fuad Tabba
2025-05-16 11:12       ` Gavin Shan
2025-05-16 14:20         ` Fuad Tabba
2025-05-21  7:41   ` David Hildenbrand
2025-05-13 16:34 ` [PATCH v9 08/17] KVM: guest_memfd: Check that userspace_addr and fd+offset refer to same range Fuad Tabba
2025-05-13 20:30   ` James Houghton
2025-05-14  7:33     ` Fuad Tabba
2025-05-14 13:32       ` Sean Christopherson
2025-05-14 13:47         ` Ackerley Tng
2025-05-14 13:52           ` Sean Christopherson
2025-05-14 17:39   ` David Hildenbrand
2025-05-13 16:34 ` [PATCH v9 09/17] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-05-21  7:48   ` David Hildenbrand
2025-05-22  0:40     ` Ackerley Tng
2025-05-22  7:16       ` David Hildenbrand
2025-05-22  7:46         ` Fuad Tabba
2025-05-22  8:14           ` David Hildenbrand
2025-05-22 10:24             ` Fuad Tabba
2025-05-13 16:34 ` [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd Fuad Tabba
2025-05-14  7:13   ` Shivank Garg
2025-05-14  7:24     ` Fuad Tabba
2025-05-14 15:27   ` kernel test robot
2025-05-21  8:01   ` David Hildenbrand
2025-05-22  0:45     ` Ackerley Tng
2025-05-22 13:22       ` Sean Christopherson
2025-05-22 13:49         ` David Hildenbrand
2025-05-22  7:22     ` Fuad Tabba
2025-05-22  8:56       ` David Hildenbrand
2025-05-22  9:34         ` Fuad Tabba
2025-05-13 16:34 ` [PATCH v9 11/17] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-05-13 16:34 ` [PATCH v9 12/17] KVM: arm64: Rename variables in user_mem_abort() Fuad Tabba
2025-05-21  2:25   ` Gavin Shan
2025-05-21  9:57     ` Fuad Tabba
2025-05-21  8:02   ` David Hildenbrand
2025-05-13 16:34 ` [PATCH v9 13/17] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-05-14 21:26   ` James Houghton
2025-05-15  9:27     ` Fuad Tabba
2025-05-21  8:04   ` David Hildenbrand
2025-05-21 11:10     ` Fuad Tabba
2025-05-13 16:34 ` [PATCH v9 14/17] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
2025-05-15 23:50   ` James Houghton
2025-05-16  7:07     ` Fuad Tabba
2025-05-21  8:05   ` David Hildenbrand
2025-05-21 10:12     ` Fuad Tabba
2025-05-21 10:26       ` David Hildenbrand
2025-05-21 10:29         ` Fuad Tabba
2025-05-21 12:44           ` David Hildenbrand
2025-05-21 13:15             ` Fuad Tabba
2025-05-21 13:21               ` David Hildenbrand
2025-05-21 13:32                 ` Fuad Tabba
2025-05-21 13:45                   ` David Hildenbrand
2025-05-21 14:14                     ` Fuad Tabba
2025-05-13 16:34 ` [PATCH v9 15/17] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
2025-05-21  2:46   ` Gavin Shan
2025-05-21  8:24     ` Fuad Tabba
2025-05-21  8:06   ` David Hildenbrand
2025-05-13 16:34 ` [PATCH v9 16/17] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-05-21  6:53   ` Gavin Shan
2025-05-21  9:38     ` Fuad Tabba
2025-05-13 16:34 ` [PATCH v9 17/17] KVM: selftests: Test guest_memfd same-range validation Fuad Tabba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).