[PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
@ 2025-06-05 15:37 Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
                   ` (18 more replies)
  0 siblings, 19 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Main changes since v10 [1]:
- Added bounds checking when faulting a shared page into the host, along
  with a selftest to verify the check.
- Refactored KVM/arm64's handling of guest faults (user_mem_abort()).
  I've dropped the Reviewed-by tags from "KVM: arm64: Refactor
  user_mem_abort()..." since it has changed significantly.
- Handled nested virtualization in KVM/arm64 when faulting guest_memfd
  backed pages into the guest.
- Addressed various points of feedback from the last revision.
- Still based on Linux 6.15

This patch series enables the mapping of guest_memfd backed memory in
the host. This is useful for VMMs like Firecracker that aim to run
guests entirely backed by guest_memfd [2]. When combined with Patrick's
series for direct map removal [3], this provides additional hardening
against Spectre-like transient execution attacks.

This series also lays the groundwork for restricted mmap() support for
guest_memfd backed memory in the host for Confidential Computing
platforms that permit in-place sharing of guest memory with the host
[4].

Patch breakdown:

Patches 1-7: Primarily refactoring and renaming to decouple the concept
of guest memory being "private" from it being backed by guest_memfd.

Patches 8-9: Add support for in-place shared memory and the ability for
the host to map it. This is gated by a new configuration option, toggled
by a new flag, and advertised to userspace by a new capability
(introduced in patch 16).

Patches 10-15: Implement the x86 and arm64 support for this feature.

Patch 16: Introduces the new capability to advertise this support and
updates the documentation.

Patches 17-18: Add and fix selftests for the new functionality.

For details on how to test this patch series, and on how to boot a guest
that uses the new features, please refer to v8 [5].

Cheers,
/fuad

[1] https://lore.kernel.org/all/20250527180245.1413463-1-tabba@google.com/
[2] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[3] https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
[4] https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
[5] https://lore.kernel.org/all/20250430165655.605595-1-tabba@google.com/

Ackerley Tng (2):
  KVM: x86/mmu: Handle guest page faults for guest_memfd with shared
    memory
  KVM: x86: Consult guest_memfd when computing max_mapping_level

Fuad Tabba (16):
  KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
    CONFIG_KVM_GENERIC_GMEM_POPULATE
  KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
  KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
  KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
  KVM: Fix comments that refer to slots_lock
  KVM: Fix comment that refers to kvm uapi header path
  KVM: guest_memfd: Allow host to map guest_memfd pages
  KVM: guest_memfd: Track shared memory support in memslot
  KVM: x86: Enable guest_memfd shared memory for SW-protected VMs
  KVM: arm64: Refactor user_mem_abort()
  KVM: arm64: Handle guest_memfd-backed guest page faults
  KVM: arm64: Enable host mapping of shared guest_memfd memory
  KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM
  KVM: selftests: Don't use hardcoded page sizes in guest_memfd test
  KVM: selftests: guest_memfd mmap() test when mapping is allowed

 Documentation/virt/kvm/api.rst                |   9 +
 arch/arm64/include/asm/kvm_host.h             |   5 +
 arch/arm64/kvm/Kconfig                        |   1 +
 arch/arm64/kvm/mmu.c                          | 200 +++++++++++++----
 arch/x86/include/asm/kvm_host.h               |  22 +-
 arch/x86/kvm/Kconfig                          |   5 +-
 arch/x86/kvm/mmu/mmu.c                        | 135 ++++++-----
 arch/x86/kvm/svm/sev.c                        |   4 +-
 arch/x86/kvm/svm/svm.c                        |   4 +-
 arch/x86/kvm/x86.c                            |   4 +-
 include/linux/kvm_host.h                      |  80 +++++--
 include/uapi/linux/kvm.h                      |   2 +
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  | 212 +++++++++++++++---
 virt/kvm/Kconfig                              |  14 +-
 virt/kvm/Makefile.kvm                         |   2 +-
 virt/kvm/guest_memfd.c                        |  94 +++++++-
 virt/kvm/kvm_main.c                           |  16 +-
 virt/kvm/kvm_mm.h                             |   4 +-
 19 files changed, 645 insertions(+), 169 deletions(-)


base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v11 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The option KVM_PRIVATE_MEM enables guest_memfd in general. Subsequent
patches add shared memory support to guest_memfd. Therefore, rename it
to KVM_GMEM to make its purpose clearer.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 include/linux/kvm_host.h        | 10 +++++-----
 virt/kvm/Kconfig                |  8 ++++----
 virt/kvm/Makefile.kvm           |  2 +-
 virt/kvm/kvm_main.c             |  4 ++--
 virt/kvm/kvm_mm.h               |  4 ++--
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7bc174a1f1cb..52f6f6d08558 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2253,7 +2253,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 		       int tdp_max_root_level, int tdp_huge_page_level);
 
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
 #else
 #define kvm_arch_has_private_mem(kvm) false
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 291d49b9bf05..d6900995725d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -601,7 +601,7 @@ struct kvm_memory_slot {
 	short id;
 	u16 as_id;
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	struct {
 		/*
 		 * Writes protected by kvm->slots_lock.  Acquiring a
@@ -722,7 +722,7 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
  * Arch code must define kvm_arch_has_private_mem if support for private memory
  * is enabled.
  */
-#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
+#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
 static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
 {
 	return false;
@@ -2504,7 +2504,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
-	return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
+	return IS_ENABLED(CONFIG_KVM_GMEM) &&
 	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
 }
 #else
@@ -2514,7 +2514,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 }
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
 		     int *max_order);
@@ -2527,7 +2527,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
-#endif /* CONFIG_KVM_PRIVATE_MEM */
+#endif /* CONFIG_KVM_GMEM */
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 727b542074e7..49df4e32bff7 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -112,19 +112,19 @@ config KVM_GENERIC_MEMORY_ATTRIBUTES
        depends on KVM_GENERIC_MMU_NOTIFIER
        bool
 
-config KVM_PRIVATE_MEM
+config KVM_GMEM
        select XARRAY_MULTI
        bool
 
 config KVM_GENERIC_PRIVATE_MEM
        select KVM_GENERIC_MEMORY_ATTRIBUTES
-       select KVM_PRIVATE_MEM
+       select KVM_GMEM
        bool
 
 config HAVE_KVM_ARCH_GMEM_PREPARE
        bool
-       depends on KVM_PRIVATE_MEM
+       depends on KVM_GMEM
 
 config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
-       depends on KVM_PRIVATE_MEM
+       depends on KVM_GMEM
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 724c89af78af..8d00918d4c8b 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -12,4 +12,4 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
-kvm-$(CONFIG_KVM_PRIVATE_MEM) += $(KVM)/guest_memfd.o
+kvm-$(CONFIG_KVM_GMEM) += $(KVM)/guest_memfd.o
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e85b33a92624..4996cac41a8f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4842,7 +4842,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_MEMORY_ATTRIBUTES:
 		return kvm_supported_mem_attributes(kvm);
 #endif
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_has_private_mem(kvm);
 #endif
@@ -5276,7 +5276,7 @@ static long kvm_vm_ioctl(struct file *filp,
 	case KVM_GET_STATS_FD:
 		r = kvm_vm_ioctl_get_stats_fd(kvm);
 		break;
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	case KVM_CREATE_GUEST_MEMFD: {
 		struct kvm_create_guest_memfd guest_memfd;
 
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index acef3f5c582a..ec311c0d6718 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -67,7 +67,7 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
 }
 #endif /* HAVE_KVM_PFNCACHE */
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 void kvm_gmem_init(struct module *module);
 int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args);
 int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -91,6 +91,6 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 {
 	WARN_ON_ONCE(1);
 }
-#endif /* CONFIG_KVM_PRIVATE_MEM */
+#endif /* CONFIG_KVM_GMEM */
 
 #endif /* __KVM_MM_H__ */
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The option KVM_GENERIC_PRIVATE_MEM enables populating a GPA range with
guest data. Rename it to KVM_GENERIC_GMEM_POPULATE to make its purpose
clearer.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/Kconfig     | 4 ++--
 include/linux/kvm_host.h | 2 +-
 virt/kvm/Kconfig         | 2 +-
 virt/kvm/guest_memfd.c   | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fe8ea8c097de..b37258253543 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -46,7 +46,7 @@ config KVM_X86
 	select HAVE_KVM_PM_NOTIFIER if PM
 	select KVM_GENERIC_HARDWARE_ENABLING
 	select KVM_GENERIC_PRE_FAULT_MEMORY
-	select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
+	select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
 	select KVM_WERROR if WERROR
 
 config KVM
@@ -145,7 +145,7 @@ config KVM_AMD_SEV
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select ARCH_HAS_CC_PLATFORM
-	select KVM_GENERIC_PRIVATE_MEM
+	select KVM_GENERIC_GMEM_POPULATE
 	select HAVE_KVM_ARCH_GMEM_PREPARE
 	select HAVE_KVM_ARCH_GMEM_INVALIDATE
 	help
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d6900995725d..7ca23837fa52 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2533,7 +2533,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
 #endif
 
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 /**
  * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
  *
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 49df4e32bff7..559c93ad90be 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -116,7 +116,7 @@ config KVM_GMEM
        select XARRAY_MULTI
        bool
 
-config KVM_GENERIC_PRIVATE_MEM
+config KVM_GENERIC_GMEM_POPULATE
        select KVM_GENERIC_MEMORY_ATTRIBUTES
        select KVM_GMEM
        bool
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b2aa6bf24d3a..befea51bbc75 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -638,7 +638,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque)
 {
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The function kvm_arch_has_private_mem() indicates whether an architecture
supports guest_memfd. Until now, this support implied the memory was
strictly private.

To decouple guest_memfd support from memory privacy, rename this
function to kvm_arch_supports_gmem().

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 8 ++++----
 arch/x86/kvm/mmu/mmu.c          | 8 ++++----
 include/linux/kvm_host.h        | 6 +++---
 virt/kvm/kvm_main.c             | 6 +++---
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 52f6f6d08558..4a83fbae7056 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2254,9 +2254,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 
 #ifdef CONFIG_KVM_GMEM
-#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
+#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
 #else
-#define kvm_arch_has_private_mem(kvm) false
+#define kvm_arch_supports_gmem(kvm) false
 #endif
 
 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
@@ -2309,8 +2309,8 @@ enum {
 #define HF_SMM_INSIDE_NMI_MASK	(1 << 2)
 
 # define KVM_MAX_NR_ADDRESS_SPACES	2
-/* SMM is currently unsupported for guests with private memory. */
-# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 2)
+/* SMM is currently unsupported for guests with guest_memfd (esp private) memory. */
+# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_supports_gmem(kvm) ? 1 : 2)
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
 #else
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d1b632e33d2..b66f1bf24e06 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4917,7 +4917,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
 	if (r)
 		return r;
 
-	if (kvm_arch_has_private_mem(vcpu->kvm) &&
+	if (kvm_arch_supports_gmem(vcpu->kvm) &&
 	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(range->gpa)))
 		error_code |= PFERR_PRIVATE_ACCESS;
 
@@ -7705,7 +7705,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
 	 * Zapping SPTEs in this case ensures KVM will reassess whether or not
 	 * a hugepage can be used for affected ranges.
 	 */
-	if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+	if (WARN_ON_ONCE(!kvm_arch_supports_gmem(kvm)))
 		return false;
 
 	if (WARN_ON_ONCE(range->end <= range->start))
@@ -7784,7 +7784,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 	 * a range that has PRIVATE GFNs, and conversely converting a range to
 	 * SHARED may now allow hugepages.
 	 */
-	if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+	if (WARN_ON_ONCE(!kvm_arch_supports_gmem(kvm)))
 		return false;
 
 	/*
@@ -7840,7 +7840,7 @@ void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
 {
 	int level;
 
-	if (!kvm_arch_has_private_mem(kvm))
+	if (!kvm_arch_supports_gmem(kvm))
 		return;
 
 	for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7ca23837fa52..6ca7279520cf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -719,11 +719,11 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
 #endif
 
 /*
- * Arch code must define kvm_arch_has_private_mem if support for private memory
+ * Arch code must define kvm_arch_supports_gmem if support for guest_memfd
  * is enabled.
  */
-#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
-static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
+#if !defined(kvm_arch_supports_gmem) && !IS_ENABLED(CONFIG_KVM_GMEM)
+static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
 {
 	return false;
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4996cac41a8f..2468d50a9ed4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1531,7 +1531,7 @@ static int check_memory_region_flags(struct kvm *kvm,
 {
 	u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
 
-	if (kvm_arch_has_private_mem(kvm))
+	if (kvm_arch_supports_gmem(kvm))
 		valid_flags |= KVM_MEM_GUEST_MEMFD;
 
 	/* Dirty logging private memory is not currently supported. */
@@ -2362,7 +2362,7 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 static u64 kvm_supported_mem_attributes(struct kvm *kvm)
 {
-	if (!kvm || kvm_arch_has_private_mem(kvm))
+	if (!kvm || kvm_arch_supports_gmem(kvm))
 		return KVM_MEMORY_ATTRIBUTE_PRIVATE;
 
 	return 0;
@@ -4844,7 +4844,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #endif
 #ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
-		return !kvm || kvm_arch_has_private_mem(kvm);
+		return !kvm || kvm_arch_supports_gmem(kvm);
 #endif
 	default:
 		break;
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (2 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The bool has_private_mem is used to indicate whether guest_memfd is
supported. Rename it to supports_gmem to make its meaning clearer and to
decouple memory being private from guest_memfd.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 4 ++--
 arch/x86/kvm/mmu/mmu.c          | 2 +-
 arch/x86/kvm/svm/svm.c          | 4 ++--
 arch/x86/kvm/x86.c              | 3 +--
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a83fbae7056..709cc2a7ba66 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1331,7 +1331,7 @@ struct kvm_arch {
 	unsigned int indirect_shadow_pages;
 	u8 mmu_valid_gen;
 	u8 vm_type;
-	bool has_private_mem;
+	bool supports_gmem;
 	bool has_protected_state;
 	bool pre_fault_allowed;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
@@ -2254,7 +2254,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 
 #ifdef CONFIG_KVM_GMEM
-#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
+#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
 #else
 #define kvm_arch_supports_gmem(kvm) false
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b66f1bf24e06..69bf2ef22ed0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3486,7 +3486,7 @@ static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault *fault
 	 * on RET_PF_SPURIOUS until the update completes, or an actual spurious
 	 * case might go down the slow path. Either case will resolve itself.
 	 */
-	if (kvm->arch.has_private_mem &&
+	if (kvm->arch.supports_gmem &&
 	    fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
 		return false;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a89c271a1951..a05b7dc7b717 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5110,8 +5110,8 @@ static int svm_vm_init(struct kvm *kvm)
 			(type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
 		to_kvm_sev_info(kvm)->need_init = true;
 
-		kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
-		kvm->arch.pre_fault_allowed = !kvm->arch.has_private_mem;
+		kvm->arch.supports_gmem = (type == KVM_X86_SNP_VM);
+		kvm->arch.pre_fault_allowed = !kvm->arch.supports_gmem;
 	}
 
 	if (!pause_filter_count || !pause_filter_thresh)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be7bb6d20129..035ced06b2dd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12718,8 +12718,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		return -EINVAL;
 
 	kvm->arch.vm_type = type;
-	kvm->arch.has_private_mem =
-		(type == KVM_X86_SW_PROTECTED_VM);
+	kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
 	/* Decided by the vendor code for other VM types.  */
 	kvm->arch.pre_fault_allowed =
 		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (3 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The function kvm_slot_can_be_private() is used to check whether a memory
slot is backed by guest_memfd. Rename it to kvm_slot_has_gmem() to make
that clearer and to decouple memory being private from guest_memfd.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 4 ++--
 arch/x86/kvm/svm/sev.c   | 4 ++--
 include/linux/kvm_host.h | 2 +-
 virt/kvm/guest_memfd.c   | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 69bf2ef22ed0..2b6376986f96 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3283,7 +3283,7 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	bool is_private = kvm_slot_can_be_private(slot) &&
+	bool is_private = kvm_slot_has_gmem(slot) &&
 			  kvm_mem_is_private(kvm, gfn);
 
 	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
@@ -4496,7 +4496,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 {
 	int max_order, r;
 
-	if (!kvm_slot_can_be_private(fault->slot)) {
+	if (!kvm_slot_has_gmem(fault->slot)) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 		return -EFAULT;
 	}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a7a7dc507336..27759ca6d2f2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2378,7 +2378,7 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	mutex_lock(&kvm->slots_lock);
 
 	memslot = gfn_to_memslot(kvm, params.gfn_start);
-	if (!kvm_slot_can_be_private(memslot)) {
+	if (!kvm_slot_has_gmem(memslot)) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -4688,7 +4688,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 	}
 
 	slot = gfn_to_memslot(kvm, gfn);
-	if (!kvm_slot_can_be_private(slot)) {
+	if (!kvm_slot_has_gmem(slot)) {
 		pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
 				    gpa);
 		return;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6ca7279520cf..d9616ee6acc7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -614,7 +614,7 @@ struct kvm_memory_slot {
 #endif
 };
 
-static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot)
+static inline bool kvm_slot_has_gmem(const struct kvm_memory_slot *slot)
 {
 	return slot && (slot->flags & KVM_MEM_GUEST_MEMFD);
 }
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index befea51bbc75..6db515833f61 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -654,7 +654,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 		return -EINVAL;
 
 	slot = gfn_to_memslot(kvm, start_gfn);
-	if (!kvm_slot_can_be_private(slot))
+	if (!kvm_slot_has_gmem(slot))
 		return -EINVAL;
 
 	file = kvm_gmem_get_file(slot);
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 06/18] KVM: Fix comments that refer to slots_lock
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (4 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Fix comments so that they refer to slots_lock instead of slots_locks
(remove trailing s).

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 2 +-
 virt/kvm/kvm_main.c      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d9616ee6acc7..ae70e4e19700 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -859,7 +859,7 @@ struct kvm {
 	struct notifier_block pm_notifier;
 #endif
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-	/* Protected by slots_locks (for writes) and RCU (for reads) */
+	/* Protected by slots_lock (for writes) and RCU (for reads) */
 	struct xarray mem_attr_array;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2468d50a9ed4..6289ea1685dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -333,7 +333,7 @@ void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
 	 * All current use cases for flushing the TLBs for a specific memslot
 	 * are related to dirty logging, and many do the TLB flush out of
 	 * mmu_lock. The interaction between the various operations on memslot
-	 * must be serialized by slots_locks to ensure the TLB flush from one
+	 * must be serialized by slots_lock to ensure the TLB flush from one
 	 * operation is observed by any other operation on the same memslot.
 	 */
 	lockdep_assert_held(&kvm->slots_lock);
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 07/18] KVM: Fix comment that refers to kvm uapi header path
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (5 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The comment that points to the path where the user-visible memslot flags
are refers to an outdated path and has a typo.

Update the comment to refer to the correct path.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae70e4e19700..80371475818f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -52,7 +52,7 @@
 /*
  * The bit 16 ~ bit 31 of kvm_userspace_memory_region::flags are internally
  * used in kvm, other bits are visible for userspace which are defined in
- * include/linux/kvm_h.
+ * include/uapi/linux/kvm.h.
  */
 #define KVM_MEMSLOT_INVALID	(1UL << 16)
 
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (6 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-06  9:12   ` David Hildenbrand
  2025-06-08 23:42   ` Gavin Shan
  2025-06-05 15:37 ` [PATCH v11 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
                   ` (10 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

This patch enables support for shared memory in guest_memfd, including
mapping that memory from host userspace.

This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
flag at creation time.

Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 13 +++++++
 include/uapi/linux/kvm.h |  1 +
 virt/kvm/Kconfig         |  4 +++
 virt/kvm/guest_memfd.c   | 76 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 94 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 80371475818f..640ce714cfb2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
 }
 #endif
 
+/*
+ * Returns true if this VM supports shared mem in guest_memfd.
+ *
+ * Arch code must define kvm_arch_supports_gmem_shared_mem if support for
+ * guest_memfd is enabled.
+ */
+#if !defined(kvm_arch_supports_gmem_shared_mem)
+static inline bool kvm_arch_supports_gmem_shared_mem(struct kvm *kvm)
+{
+	return false;
+}
+#endif
+
 #ifndef kvm_arch_has_readonly_mem
 static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6ae8ad8934b..c2714c9d1a0e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1566,6 +1566,7 @@ struct kvm_memory_attributes {
 #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
 
 #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
+#define GUEST_MEMFD_FLAG_SUPPORT_SHARED	(1ULL << 0)
 
 struct kvm_create_guest_memfd {
 	__u64 size;
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 559c93ad90be..e90884f74404 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
 config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
        depends on KVM_GMEM
+
+config KVM_GMEM_SHARED_MEM
+       select KVM_GMEM
+       bool
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 6db515833f61..7a158789d1df 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -312,7 +312,79 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
 	return gfn - slot->base_gfn + slot->gmem.pgoff;
 }
 
+static bool kvm_gmem_supports_shared(struct inode *inode)
+{
+	u64 flags;
+
+	if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
+		return false;
+
+	flags = (u64)inode->i_private;
+
+	return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
+}
+
+static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
+{
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct folio *folio;
+	vm_fault_t ret = VM_FAULT_LOCKED;
+
+	if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
+		return VM_FAULT_SIGBUS;
+
+	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+	if (IS_ERR(folio)) {
+		int err = PTR_ERR(folio);
+
+		if (err == -EAGAIN)
+			return VM_FAULT_RETRY;
+
+		return vmf_error(err);
+	}
+
+	if (WARN_ON_ONCE(folio_test_large(folio))) {
+		ret = VM_FAULT_SIGBUS;
+		goto out_folio;
+	}
+
+	if (!folio_test_uptodate(folio)) {
+		clear_highpage(folio_page(folio, 0));
+		kvm_gmem_mark_prepared(folio);
+	}
+
+	vmf->page = folio_file_page(folio, vmf->pgoff);
+
+out_folio:
+	if (ret != VM_FAULT_LOCKED) {
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+
+	return ret;
+}
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+	.fault = kvm_gmem_fault_shared,
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	if (!kvm_gmem_supports_shared(file_inode(file)))
+		return -ENODEV;
+
+	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
+	    (VM_SHARED | VM_MAYSHARE)) {
+		return -EINVAL;
+	}
+
+	vma->vm_ops = &kvm_gmem_vm_ops;
+
+	return 0;
+}
+
 static struct file_operations kvm_gmem_fops = {
+	.mmap		= kvm_gmem_mmap,
 	.open		= generic_file_open,
 	.release	= kvm_gmem_release,
 	.fallocate	= kvm_gmem_fallocate,
@@ -428,6 +500,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	}
 
 	file->f_flags |= O_LARGEFILE;
+	allow_write_access(file);
 
 	inode = file->f_inode;
 	WARN_ON(file->f_mapping != inode->i_mapping);
@@ -463,6 +536,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
 	u64 flags = args->flags;
 	u64 valid_flags = 0;
 
+	if (kvm_arch_supports_gmem_shared_mem(kvm))
+		valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
+
 	if (flags & ~valid_flags)
 		return -EINVAL;
 
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 09/18] KVM: guest_memfd: Track shared memory support in memslot
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (7 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-06  9:13   ` David Hildenbrand
  2025-06-08 23:42   ` Gavin Shan
  2025-06-05 15:37 ` [PATCH v11 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Add a new internal flag in the top half of memslot->flags to track when
a guest_memfd-backed slot supports shared memory, which is reserved for
internal use in KVM.

This avoids repeatedly checking the underlying guest_memfd file for
shared memory support, which requires taking a reference on the file.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 11 ++++++++++-
 virt/kvm/guest_memfd.c   |  2 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 640ce714cfb2..6326d1ad8225 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -54,7 +54,8 @@
  * used in kvm, other bits are visible for userspace which are defined in
  * include/uapi/linux/kvm.h.
  */
-#define KVM_MEMSLOT_INVALID	(1UL << 16)
+#define KVM_MEMSLOT_INVALID			(1UL << 16)
+#define KVM_MEMSLOT_SUPPORTS_GMEM_SHARED	(1UL << 17)
 
 /*
  * Bit 63 of the memslot generation number is an "update in-progress flag",
@@ -2502,6 +2503,14 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
 		vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
 }
 
+static inline bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot)
+{
+	if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
+		return false;
+
+	return slot->flags & KVM_MEMSLOT_SUPPORTS_GMEM_SHARED;
+}
+
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
 {
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 7a158789d1df..e0fa49699e05 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -595,6 +595,8 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
 	 */
 	WRITE_ONCE(slot->gmem.file, file);
 	slot->gmem.pgoff = start;
+	if (kvm_gmem_supports_shared(inode))
+		slot->flags |= KVM_MEMSLOT_SUPPORTS_GMEM_SHARED;
 
 	xa_store_range(&gmem->bindings, start, end - 1, slot, GFP_KERNEL);
 	filemap_invalidate_unlock(inode->i_mapping);
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (8 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

From: Ackerley Tng <ackerleytng@google.com>

For memslots backed by guest_memfd with shared mem support, the KVM MMU
must always fault in pages from guest_memfd, and not from the host
userspace_addr. Update the fault handler to do so.

This patch also refactors related function names for accuracy:

kvm_mem_is_private() returns true only when the current private/shared
state (in the CoCo sense) of the memory is private, and returns false if
the current state is shared explicitly or impicitly, e.g., belongs to a
non-CoCo VM.

kvm_mmu_faultin_pfn_gmem() is updated to indicate that it can be used to
fault in not just private memory, but more generally, from guest_memfd.

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 38 +++++++++++++++++++++++---------------
 include/linux/kvm_host.h | 25 +++++++++++++++++++++++--
 2 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2b6376986f96..5b7df2905aa9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3289,6 +3289,11 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
 }
 
+static inline bool fault_from_gmem(struct kvm_page_fault *fault)
+{
+	return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
+}
+
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_memory_slot *slot = fault->slot;
@@ -4465,21 +4470,25 @@ static inline u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
-static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
-					u8 max_level, int gmem_order)
+static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
+					    struct kvm_page_fault *fault,
+					    int order)
 {
-	u8 req_max_level;
+	u8 max_level = fault->max_level;
 
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
-	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
+	max_level = min(kvm_max_level_for_order(order), max_level);
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
-	req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn);
-	if (req_max_level)
-		max_level = min(max_level, req_max_level);
+	if (fault->is_private) {
+		u8 level = kvm_x86_call(private_max_mapping_level)(kvm, fault->pfn);
+
+		if (level)
+			max_level = min(max_level, level);
+	}
 
 	return max_level;
 }
@@ -4491,10 +4500,10 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 				 r == RET_PF_RETRY, fault->map_writable);
 }
 
-static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
-				       struct kvm_page_fault *fault)
+static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
+				    struct kvm_page_fault *fault)
 {
-	int max_order, r;
+	int gmem_order, r;
 
 	if (!kvm_slot_has_gmem(fault->slot)) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
@@ -4502,15 +4511,14 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	}
 
 	r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
-			     &fault->refcounted_page, &max_order);
+			     &fault->refcounted_page, &gmem_order);
 	if (r) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 		return r;
 	}
 
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
-	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
-							 fault->max_level, max_order);
+	fault->max_level = kvm_max_level_for_fault_and_order(vcpu->kvm, fault, gmem_order);
 
 	return RET_PF_CONTINUE;
 }
@@ -4520,8 +4528,8 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
 {
 	unsigned int foll = fault->write ? FOLL_WRITE : 0;
 
-	if (fault->is_private)
-		return kvm_mmu_faultin_pfn_private(vcpu, fault);
+	if (fault_from_gmem(fault))
+		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
 
 	foll |= FOLL_NOWAIT;
 	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6326d1ad8225..c1c76794b25a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2524,10 +2524,31 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
 bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 					 struct kvm_gfn_range *range);
 
+/*
+ * Returns true if the given gfn's private/shared status (in the CoCo sense) is
+ * private.
+ *
+ * A return value of false indicates that the gfn is explicitly or implicitly
+ * shared (i.e., non-CoCo VMs).
+ */
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
-	return IS_ENABLED(CONFIG_KVM_GMEM) &&
-	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
+	struct kvm_memory_slot *slot;
+
+	if (!IS_ENABLED(CONFIG_KVM_GMEM))
+		return false;
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
+		/*
+		 * Without in-place conversion support, if a guest_memfd memslot
+		 * supports shared memory, then all the slot's memory is
+		 * considered not private, i.e., implicitly shared.
+		 */
+		return false;
+	}
+
+	return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
 }
 #else
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (9 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-06  9:14   ` David Hildenbrand
  2025-06-05 15:37 ` [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs Fuad Tabba
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

From: Ackerley Tng <ackerleytng@google.com>

This patch adds kvm_gmem_max_mapping_level(), which always returns
PG_LEVEL_4K since guest_memfd only supports 4K pages for now.

When guest_memfd supports shared memory, max_mapping_level (especially
when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
from recover_huge_pages_range()) should take input from
guest_memfd.

Input from guest_memfd should be taken in these cases:

+ if the memslot supports shared memory (guest_memfd is used for
  shared memory, or in future both shared and private memory) or
+ if the memslot is only used for private memory and that gfn is
  private.

If the memslot doesn't use guest_memfd, figure out the
max_mapping_level using the host page tables like before.

This patch also refactors and inlines the other call to
__kvm_mmu_max_mapping_level().

In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
provided (if applicable) in fault->max_level. Hence, there is no need
to query guest_memfd.

lpage_info is queried like before, and then if the fault is not from
guest_memfd, adjust fault->req_level based on input from host page
tables.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 87 +++++++++++++++++++++++++---------------
 include/linux/kvm_host.h | 11 +++++
 virt/kvm/guest_memfd.c   | 12 ++++++
 3 files changed, 78 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5b7df2905aa9..9e0bc8114859 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return level;
 }
 
-static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
-				       const struct kvm_memory_slot *slot,
-				       gfn_t gfn, int max_level, bool is_private)
+static int kvm_lpage_info_max_mapping_level(struct kvm *kvm,
+					    const struct kvm_memory_slot *slot,
+					    gfn_t gfn, int max_level)
 {
 	struct kvm_lpage_info *linfo;
-	int host_level;
 
 	max_level = min(max_level, max_huge_page_level);
 	for ( ; max_level > PG_LEVEL_4K; max_level--) {
@@ -3270,28 +3269,61 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 			break;
 	}
 
-	if (is_private)
-		return max_level;
+	return max_level;
+}
+
+static inline u8 kvm_max_level_for_order(int order)
+{
+	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
+
+	KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
+			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
+			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
+		return PG_LEVEL_1G;
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+		return PG_LEVEL_2M;
+
+	return PG_LEVEL_4K;
+}
+
+static inline int kvm_gmem_max_mapping_level(const struct kvm_memory_slot *slot,
+					     gfn_t gfn, int max_level)
+{
+	int max_order;
 
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
-	host_level = host_pfn_mapping_level(kvm, gfn, slot);
-	return min(host_level, max_level);
+	max_order = kvm_gmem_mapping_order(slot, gfn);
+	return min(max_level, kvm_max_level_for_order(max_order));
 }
 
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	bool is_private = kvm_slot_has_gmem(slot) &&
-			  kvm_mem_is_private(kvm, gfn);
+	int max_level;
 
-	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
+	max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM);
+	if (max_level == PG_LEVEL_4K)
+		return PG_LEVEL_4K;
+
+	if (kvm_slot_has_gmem(slot) &&
+	    (kvm_gmem_memslot_supports_shared(slot) ||
+	     kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+		return kvm_gmem_max_mapping_level(slot, gfn, max_level);
+	}
+
+	return min(max_level, host_pfn_mapping_level(kvm, gfn, slot));
 }
 
 static inline bool fault_from_gmem(struct kvm_page_fault *fault)
 {
-	return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
+	return fault->is_private ||
+	       (kvm_slot_has_gmem(fault->slot) &&
+		kvm_gmem_memslot_supports_shared(fault->slot));
 }
 
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
@@ -3314,12 +3346,20 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * Enforce the iTLB multihit workaround after capturing the requested
 	 * level, which will be used to do precise, accurate accounting.
 	 */
-	fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot,
-						       fault->gfn, fault->max_level,
-						       fault->is_private);
+	fault->req_level = kvm_lpage_info_max_mapping_level(vcpu->kvm, slot,
+							    fault->gfn, fault->max_level);
 	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
 		return;
 
+	if (!fault_from_gmem(fault)) {
+		int host_level;
+
+		host_level = host_pfn_mapping_level(vcpu->kvm, fault->gfn, slot);
+		fault->req_level = min(fault->req_level, host_level);
+		if (fault->req_level == PG_LEVEL_4K)
+			return;
+	}
+
 	/*
 	 * mmu_invalidate_retry() was successful and mmu_lock is held, so
 	 * the pmd can't be split from under us.
@@ -4453,23 +4493,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 		vcpu->stat.pf_fixed++;
 }
 
-static inline u8 kvm_max_level_for_order(int order)
-{
-	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
-
-	KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
-			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
-			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
-
-	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
-		return PG_LEVEL_1G;
-
-	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
-		return PG_LEVEL_2M;
-
-	return PG_LEVEL_4K;
-}
-
 static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
 					    struct kvm_page_fault *fault,
 					    int order)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c1c76794b25a..d55d870b354d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2551,6 +2551,10 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 	return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
 }
 #else
+static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
+{
+	return 0;
+}
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
 	return false;
@@ -2561,6 +2565,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
 		     int *max_order);
+int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
 #else
 static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn,
@@ -2570,6 +2575,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
+static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
+					 gfn_t gfn)
+{
+	BUG();
+	return 0;
+}
 #endif /* CONFIG_KVM_GMEM */
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index e0fa49699e05..b07e38fd91f5 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -716,6 +716,18 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
+/*
+ * Returns the mapping order for this @gfn in @slot.
+ *
+ * This is equal to max_order that would be returned if kvm_gmem_get_pfn() were
+ * called now.
+ */
+int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_gmem_mapping_order);
+
 #ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque)
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (10 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 15:49   ` David Hildenbrand
  2025-06-05 15:37 ` [PATCH v11 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Define the architecture-specific macro to enable shared memory support
in guest_memfd for relevant software-only VM types, specifically
KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.

Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
enabled.

Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 10 ++++++++++
 arch/x86/kvm/Kconfig            |  1 +
 arch/x86/kvm/x86.c              |  3 ++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 709cc2a7ba66..ce9ad4cd93c5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 #ifdef CONFIG_KVM_GMEM
 #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
+
+/*
+ * CoCo VMs with hardware support that use guest_memfd only for backing private
+ * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
+ */
+#define kvm_arch_supports_gmem_shared_mem(kvm)			\
+	(IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&			\
+	 ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||		\
+	  (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
 #else
 #define kvm_arch_supports_gmem(kvm) false
+#define kvm_arch_supports_gmem_shared_mem(kvm) false
 #endif
 
 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index b37258253543..fdf24b50af9d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -47,6 +47,7 @@ config KVM_X86
 	select KVM_GENERIC_HARDWARE_ENABLING
 	select KVM_GENERIC_PRE_FAULT_MEMORY
 	select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
+	select KVM_GMEM_SHARED_MEM if KVM_SW_PROTECTED_VM
 	select KVM_WERROR if WERROR
 
 config KVM
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 035ced06b2dd..2a02f2457c42 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12718,7 +12718,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		return -EINVAL;
 
 	kvm->arch.vm_type = type;
-	kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
+	kvm->arch.supports_gmem =
+		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
 	/* Decided by the vendor code for other VM types.  */
 	kvm->arch.pre_fault_allowed =
 		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 13/18] KVM: arm64: Refactor user_mem_abort()
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (11 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-09  0:27   ` Gavin Shan
  2025-06-05 15:37 ` [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

To simplify the code and to make the assumptions clearer,
refactor user_mem_abort() by immediately setting force_pte to
true if the conditions are met.

Remove the comment about logging_active being guaranteed to never be
true for VM_PFNMAP memslots, since it's not actually correct.

Move code that will be reused in the following patch into separate
functions.

Other small instances of tidying up.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 100 ++++++++++++++++++++++++-------------------
 1 file changed, 55 insertions(+), 45 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index eeda92330ade..ce80be116a30 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1466,13 +1466,56 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
+static int prepare_mmu_memcache(struct kvm_vcpu *vcpu, bool topup_memcache,
+				void **memcache)
+{
+	int min_pages;
+
+	if (!is_protected_kvm_enabled())
+		*memcache = &vcpu->arch.mmu_page_cache;
+	else
+		*memcache = &vcpu->arch.pkvm_memcache;
+
+	if (!topup_memcache)
+		return 0;
+
+	min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
+
+	if (!is_protected_kvm_enabled())
+		return kvm_mmu_topup_memory_cache(*memcache, min_pages);
+
+	return topup_hyp_memcache(*memcache, min_pages);
+}
+
+/*
+ * Potentially reduce shadow S2 permissions to match the guest's own S2. For
+ * exec faults, we'd only reach this point if the guest actually allowed it (see
+ * kvm_s2_handle_perm_fault).
+ *
+ * Also encode the level of the original translation in the SW bits of the leaf
+ * entry as a proxy for the span of that translation. This will be retrieved on
+ * TLB invalidation from the guest and used to limit the invalidation scope if a
+ * TTL hint or a range isn't provided.
+ */
+static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
+				      enum kvm_pgtable_prot *prot,
+				      bool *writable)
+{
+	*writable &= kvm_s2_trans_writable(nested);
+	if (!kvm_s2_trans_readable(nested))
+		*prot &= ~KVM_PGTABLE_PROT_R;
+
+	*prot |= kvm_encode_nested_level(nested);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  bool fault_is_perm)
 {
 	int ret = 0;
-	bool write_fault, writable, force_pte = false;
+	bool topup_memcache;
+	bool write_fault, writable;
 	bool exec_fault, mte_allowed;
 	bool device = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
@@ -1484,6 +1527,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
+	bool force_pte = logging_active || is_protected_kvm_enabled();
 	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
@@ -1501,28 +1545,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 	}
 
-	if (!is_protected_kvm_enabled())
-		memcache = &vcpu->arch.mmu_page_cache;
-	else
-		memcache = &vcpu->arch.pkvm_memcache;
-
 	/*
 	 * Permission faults just need to update the existing leaf entry,
 	 * and so normally don't require allocations from the memcache. The
 	 * only exception to this is when dirty logging is enabled at runtime
 	 * and a write fault needs to collapse a block entry into a table.
 	 */
-	if (!fault_is_perm || (logging_active && write_fault)) {
-		int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
-
-		if (!is_protected_kvm_enabled())
-			ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
-		else
-			ret = topup_hyp_memcache(memcache, min_pages);
-
-		if (ret)
-			return ret;
-	}
+	topup_memcache = !fault_is_perm || (logging_active && write_fault);
+	ret = prepare_mmu_memcache(vcpu, topup_memcache, &memcache);
+	if (ret)
+		return ret;
 
 	/*
 	 * Let's check if we will get back a huge page backed by hugetlbfs, or
@@ -1536,16 +1568,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 	}
 
-	/*
-	 * logging_active is guaranteed to never be true for VM_PFNMAP
-	 * memslots.
-	 */
-	if (logging_active || is_protected_kvm_enabled()) {
-		force_pte = true;
+	if (force_pte)
 		vma_shift = PAGE_SHIFT;
-	} else {
+	else
 		vma_shift = get_vma_page_shift(vma, hva);
-	}
 
 	switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -1597,7 +1623,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			max_map_size = PAGE_SIZE;
 
 		force_pte = (max_map_size == PAGE_SIZE);
-		vma_pagesize = min(vma_pagesize, (long)max_map_size);
+		vma_pagesize = min_t(long, vma_pagesize, max_map_size);
 	}
 
 	/*
@@ -1626,7 +1652,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
 	 * with the smp_wmb() in kvm_mmu_invalidate_end().
 	 */
-	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+	mmu_seq = kvm->mmu_invalidate_seq;
 	mmap_read_unlock(current->mm);
 
 	pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
@@ -1661,24 +1687,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault && device)
 		return -ENOEXEC;
 
-	/*
-	 * Potentially reduce shadow S2 permissions to match the guest's own
-	 * S2. For exec faults, we'd only reach this point if the guest
-	 * actually allowed it (see kvm_s2_handle_perm_fault).
-	 *
-	 * Also encode the level of the original translation in the SW bits
-	 * of the leaf entry as a proxy for the span of that translation.
-	 * This will be retrieved on TLB invalidation from the guest and
-	 * used to limit the invalidation scope if a TTL hint or a range
-	 * isn't provided.
-	 */
-	if (nested) {
-		writable &= kvm_s2_trans_writable(nested);
-		if (!kvm_s2_trans_readable(nested))
-			prot &= ~KVM_PGTABLE_PROT_R;
-
-		prot |= kvm_encode_nested_level(nested);
-	}
+	if (nested)
+		adjust_nested_fault_perms(nested, &prot, &writable);
 
 	kvm_fault_lock(kvm);
 	pgt = vcpu->arch.hw_mmu->pgt;
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (12 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 17:21   ` James Houghton
  2025-06-09  4:08   ` Gavin Shan
  2025-06-05 15:37 ` [PATCH v11 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
                   ` (4 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Add arm64 support for handling guest page faults on guest_memfd backed
memslots. Until guest_memfd supports huge pages, the fault granule is
restricted to PAGE_SIZE.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 93 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 90 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ce80be116a30..f14925fe6144 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1508,6 +1508,89 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
 	*prot |= kvm_encode_nested_level(nested);
 }
 
+#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
+
+static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+		      struct kvm_s2_trans *nested,
+		      struct kvm_memory_slot *memslot, bool is_perm)
+{
+	bool logging, write_fault, exec_fault, writable;
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
+	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+	struct page *page;
+	struct kvm *kvm = vcpu->kvm;
+	void *memcache;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+	int ret;
+
+	ret = prepare_mmu_memcache(vcpu, !is_perm, &memcache);
+	if (ret)
+		return ret;
+
+	if (nested)
+		gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
+	else
+		gfn = fault_ipa >> PAGE_SHIFT;
+
+	logging = memslot_is_logging(memslot);
+	write_fault = kvm_is_write_fault(vcpu);
+	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
+
+	if (write_fault && exec_fault) {
+		kvm_err("Simultaneous write and execution fault\n");
+		return -EFAULT;
+	}
+
+	if (is_perm && !write_fault && !exec_fault) {
+		kvm_err("Unexpected L2 read permission error\n");
+		return -EFAULT;
+	}
+
+	ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
+	if (ret) {
+		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
+					      write_fault, exec_fault, false);
+		return ret;
+	}
+
+	writable = !(memslot->flags & KVM_MEM_READONLY) &&
+		   (!logging || write_fault);
+
+	if (nested)
+		adjust_nested_fault_perms(nested, &prot, &writable);
+
+	if (writable)
+		prot |= KVM_PGTABLE_PROT_W;
+
+	if (exec_fault ||
+	    (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
+	     (!nested || kvm_s2_trans_executable(nested))))
+		prot |= KVM_PGTABLE_PROT_X;
+
+	kvm_fault_lock(kvm);
+	if (is_perm) {
+		/*
+		 * Drop the SW bits in favour of those stored in the
+		 * PTE, which will be preserved.
+		 */
+		prot &= ~KVM_NV_GUEST_MAP_SZ;
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags);
+	} else {
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
+					     __pfn_to_phys(pfn), prot,
+					     memcache, flags);
+	}
+	kvm_release_faultin_page(kvm, page, !!ret, writable);
+	kvm_fault_unlock(kvm);
+
+	if (writable && !ret)
+		mark_page_dirty_in_slot(kvm, memslot, gfn);
+
+	return ret != -EAGAIN ? ret : 0;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
@@ -1532,7 +1615,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 	struct page *page;
-	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
 
 	if (fault_is_perm)
 		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@@ -1959,8 +2042,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
-			     esr_fsc_is_permission_fault(esr));
+	if (kvm_slot_has_gmem(memslot))
+		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
+				 esr_fsc_is_permission_fault(esr));
+	else
+		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
+				     esr_fsc_is_permission_fault(esr));
 	if (ret == 0)
 		ret = 1;
 out:
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (13 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 17:26   ` James Houghton
  2025-06-09  0:29   ` Gavin Shan
  2025-06-05 15:37 ` [PATCH v11 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
                   ` (3 subsequent siblings)
  18 siblings, 2 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Enable the host mapping of guest_memfd-backed memory on arm64.

This applies to all current arm64 VM types that support guest_memfd.
Future VM types can restrict this behavior via the
kvm_arch_gmem_supports_shared_mem() hook if needed.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 5 +++++
 arch/arm64/kvm/Kconfig            | 1 +
 arch/arm64/kvm/mmu.c              | 7 +++++++
 3 files changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 08ba91e6fb03..8add94929711 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1593,4 +1593,9 @@ static inline bool kvm_arch_has_irq_bypass(void)
 	return true;
 }
 
+#ifdef CONFIG_KVM_GMEM
+#define kvm_arch_supports_gmem(kvm) true
+#define kvm_arch_supports_gmem_shared_mem(kvm) IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
+#endif
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 096e45acadb2..8c1e1964b46a 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -38,6 +38,7 @@ menuconfig KVM
 	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
+	select KVM_GMEM_SHARED_MEM
 	help
 	  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f14925fe6144..19aca1442bbf 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2281,6 +2281,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	if ((new->base_gfn + new->npages) > (kvm_phys_size(&kvm->arch.mmu) >> PAGE_SHIFT))
 		return -EFAULT;
 
+	/*
+	 * Only support guest_memfd backed memslots with shared memory, since
+	 * there aren't any CoCo VMs that support only private memory on arm64.
+	 */
+	if (kvm_slot_has_gmem(new) && !kvm_gmem_memslot_supports_shared(new))
+		return -EINVAL;
+
 	hva = new->userspace_addr;
 	reg_end = hva + (new->npages << PAGE_SHIFT);
 
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (14 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-05 15:37 ` [PATCH v11 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

This patch introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
indicates that guest_memfd supports shared memory (when enabled by the
flag). This support is limited to certain VM types, determined per
architecture.

This patch also updates the KVM documentation with details on the new
capability, flag, and other information about support for shared memory
in guest_memfd.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 Documentation/virt/kvm/api.rst | 9 +++++++++
 include/uapi/linux/kvm.h       | 1 +
 virt/kvm/kvm_main.c            | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 47c7c3f92314..59f994a99481 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6390,6 +6390,15 @@ most one mapping per page, i.e. binding multiple memory regions to a single
 guest_memfd range is not allowed (any number of memory regions can be bound to
 a single guest_memfd file, but the bound ranges must not overlap).
 
+When the capability KVM_CAP_GMEM_SHARED_MEM is supported, the 'flags' field
+supports GUEST_MEMFD_FLAG_SUPPORT_SHARED.  Setting this flag on guest_memfd
+creation enables mmap() and faulting of guest_memfd memory to host userspace.
+
+When the KVM MMU performs a PFN lookup to service a guest fault and the backing
+guest_memfd has the GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then the fault will
+always be consumed from guest_memfd, regardless of whether it is a shared or a
+private fault.
+
 See KVM_SET_USER_MEMORY_REGION2 for additional details.
 
 4.143 KVM_PRE_FAULT_MEMORY
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c2714c9d1a0e..5aa85d34a29a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -930,6 +930,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
 #define KVM_CAP_X86_GUEST_MODE 238
 #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
+#define KVM_CAP_GMEM_SHARED_MEM 240
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6289ea1685dd..64ed4da70d2f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4845,6 +4845,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_supports_gmem(kvm);
+#endif
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+	case KVM_CAP_GMEM_SHARED_MEM:
+		return !kvm || kvm_arch_supports_gmem_shared_mem(kvm);
 #endif
 	default:
 		break;
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (15 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
@ 2025-06-05 15:37 ` Fuad Tabba
  2025-06-06  8:15   ` David Hildenbrand
  2025-06-08 23:43   ` Gavin Shan
  2025-06-05 15:38 ` [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
  2025-06-06  9:18 ` [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
  18 siblings, 2 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:37 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Using hardcoded page size values could cause the test to fail on systems
that have larger pages, e.g., arm64 with 64kB pages. Use getpagesize()
instead.

Also, build the guest_memfd selftest for arm64.

Suggested-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm       |  1 +
 tools/testing/selftests/kvm/guest_memfd_test.c | 11 ++++++-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index f62b0a5aba35..845fcaf8b6c9 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -164,6 +164,7 @@ TEST_GEN_PROGS_arm64 += arch_timer
 TEST_GEN_PROGS_arm64 += coalesced_io_test
 TEST_GEN_PROGS_arm64 += dirty_log_perf_test
 TEST_GEN_PROGS_arm64 += get-reg-list
+TEST_GEN_PROGS_arm64 += guest_memfd_test
 TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
 TEST_GEN_PROGS_arm64 += memslot_perf_test
 TEST_GEN_PROGS_arm64 += mmu_stress_test
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index ce687f8d248f..341ba616cf55 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -146,24 +146,25 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 {
 	int fd1, fd2, ret;
 	struct stat st1, st2;
+	size_t page_size = getpagesize();
 
-	fd1 = __vm_create_guest_memfd(vm, 4096, 0);
+	fd1 = __vm_create_guest_memfd(vm, page_size, 0);
 	TEST_ASSERT(fd1 != -1, "memfd creation should succeed");
 
 	ret = fstat(fd1, &st1);
 	TEST_ASSERT(ret != -1, "memfd fstat should succeed");
-	TEST_ASSERT(st1.st_size == 4096, "memfd st_size should match requested size");
+	TEST_ASSERT(st1.st_size == page_size, "memfd st_size should match requested size");
 
-	fd2 = __vm_create_guest_memfd(vm, 8192, 0);
+	fd2 = __vm_create_guest_memfd(vm, page_size * 2, 0);
 	TEST_ASSERT(fd2 != -1, "memfd creation should succeed");
 
 	ret = fstat(fd2, &st2);
 	TEST_ASSERT(ret != -1, "memfd fstat should succeed");
-	TEST_ASSERT(st2.st_size == 8192, "second memfd st_size should match requested size");
+	TEST_ASSERT(st2.st_size == page_size * 2, "second memfd st_size should match requested size");
 
 	ret = fstat(fd1, &st1);
 	TEST_ASSERT(ret != -1, "memfd fstat should succeed");
-	TEST_ASSERT(st1.st_size == 4096, "first memfd st_size should still match requested size");
+	TEST_ASSERT(st1.st_size == page_size, "first memfd st_size should still match requested size");
 	TEST_ASSERT(st1.st_ino != st2.st_ino, "different memfd should have different inode numbers");
 
 	close(fd2);
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (16 preceding siblings ...)
  2025-06-05 15:37 ` [PATCH v11 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
@ 2025-06-05 15:38 ` Fuad Tabba
  2025-06-05 22:07   ` James Houghton
  2025-06-08 23:43   ` Gavin Shan
  2025-06-06  9:18 ` [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
  18 siblings, 2 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 15:38 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Expand the guest_memfd selftests to include testing mapping guest
memory for VM types that support it.

Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 201 ++++++++++++++++--
 1 file changed, 180 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 341ba616cf55..1612d3adcd0d 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -13,6 +13,8 @@
 
 #include <linux/bitmap.h>
 #include <linux/falloc.h>
+#include <setjmp.h>
+#include <signal.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -34,12 +36,83 @@ static void test_file_read_write(int fd)
 		    "pwrite on a guest_mem fd should fail");
 }
 
-static void test_mmap(int fd, size_t page_size)
+static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
+{
+	const char val = 0xaa;
+	char *mem;
+	size_t i;
+	int ret;
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+	TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
+
+	memset(mem, val, total_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
+			page_size);
+	TEST_ASSERT(!ret, "fallocate the first page should succeed.");
+
+	for (i = 0; i < page_size; i++)
+		TEST_ASSERT_EQ(mem[i], 0x00);
+	for (; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	memset(mem, val, page_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	ret = munmap(mem, total_size);
+	TEST_ASSERT(!ret, "munmap() should succeed.");
+}
+
+static sigjmp_buf jmpbuf;
+void fault_sigbus_handler(int signum)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static void test_fault_overflow(int fd, size_t page_size, size_t total_size)
+{
+	struct sigaction sa_old, sa_new = {
+		.sa_handler = fault_sigbus_handler,
+	};
+	size_t map_size = total_size * 4;
+	const char val = 0xaa;
+	char *mem;
+	size_t i;
+	int ret;
+
+	mem = mmap(NULL, map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
+
+	sigaction(SIGBUS, &sa_new, &sa_old);
+	if (sigsetjmp(jmpbuf, 1) == 0) {
+		memset(mem, 0xaa, map_size);
+		TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
+	}
+	sigaction(SIGBUS, &sa_old, NULL);
+
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	ret = munmap(mem, map_size);
+	TEST_ASSERT(!ret, "munmap() should succeed.");
+}
+
+static void test_mmap_not_supported(int fd, size_t page_size, size_t total_size)
 {
 	char *mem;
 
 	mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 	TEST_ASSERT_EQ(mem, MAP_FAILED);
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT_EQ(mem, MAP_FAILED);
 }
 
 static void test_file_size(int fd, size_t page_size, size_t total_size)
@@ -120,26 +193,19 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
 	}
 }
 
-static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
+static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
+						  uint64_t guest_memfd_flags,
+						  size_t page_size)
 {
-	size_t page_size = getpagesize();
-	uint64_t flag;
 	size_t size;
 	int fd;
 
 	for (size = 1; size < page_size; size++) {
-		fd = __vm_create_guest_memfd(vm, size, 0);
-		TEST_ASSERT(fd == -1 && errno == EINVAL,
+		fd = __vm_create_guest_memfd(vm, size, guest_memfd_flags);
+		TEST_ASSERT(fd < 0 && errno == EINVAL,
 			    "guest_memfd() with non-page-aligned page size '0x%lx' should fail with EINVAL",
 			    size);
 	}
-
-	for (flag = BIT(0); flag; flag <<= 1) {
-		fd = __vm_create_guest_memfd(vm, page_size, flag);
-		TEST_ASSERT(fd == -1 && errno == EINVAL,
-			    "guest_memfd() with flag '0x%lx' should fail with EINVAL",
-			    flag);
-	}
 }
 
 static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
@@ -171,30 +237,123 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 	close(fd1);
 }
 
-int main(int argc, char *argv[])
+static bool check_vm_type(unsigned long vm_type)
 {
-	size_t page_size;
+	/*
+	 * Not all architectures support KVM_CAP_VM_TYPES. However, those that
+	 * support guest_memfd have that support for the default VM type.
+	 */
+	if (vm_type == VM_TYPE_DEFAULT)
+		return true;
+
+	return kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type);
+}
+
+static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
+			   bool expect_mmap_allowed)
+{
+	struct kvm_vm *vm;
 	size_t total_size;
+	size_t page_size;
 	int fd;
-	struct kvm_vm *vm;
 
-	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+	if (!check_vm_type(vm_type))
+		return;
 
 	page_size = getpagesize();
 	total_size = page_size * 4;
 
-	vm = vm_create_barebones();
+	vm = vm_create_barebones_type(vm_type);
 
-	test_create_guest_memfd_invalid(vm);
 	test_create_guest_memfd_multiple(vm);
+	test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size);
 
-	fd = vm_create_guest_memfd(vm, total_size, 0);
+	fd = vm_create_guest_memfd(vm, total_size, guest_memfd_flags);
 
 	test_file_read_write(fd);
-	test_mmap(fd, page_size);
+
+	if (expect_mmap_allowed) {
+		test_mmap_supported(fd, page_size, total_size);
+		test_fault_overflow(fd, page_size, total_size);
+
+	} else {
+		test_mmap_not_supported(fd, page_size, total_size);
+	}
+
 	test_file_size(fd, page_size, total_size);
 	test_fallocate(fd, page_size, total_size);
 	test_invalid_punch_hole(fd, page_size, total_size);
 
 	close(fd);
+	kvm_vm_release(vm);
+}
+
+static void test_vm_type_gmem_flag_validity(unsigned long vm_type,
+					    uint64_t expected_valid_flags)
+{
+	size_t page_size = getpagesize();
+	struct kvm_vm *vm;
+	uint64_t flag = 0;
+	int fd;
+
+	if (!check_vm_type(vm_type))
+		return;
+
+	vm = vm_create_barebones_type(vm_type);
+
+	for (flag = BIT(0); flag; flag <<= 1) {
+		fd = __vm_create_guest_memfd(vm, page_size, flag);
+
+		if (flag & expected_valid_flags) {
+			TEST_ASSERT(fd >= 0,
+				    "guest_memfd() with flag '0x%lx' should be valid",
+				    flag);
+			close(fd);
+		} else {
+			TEST_ASSERT(fd < 0 && errno == EINVAL,
+				    "guest_memfd() with flag '0x%lx' should fail with EINVAL",
+				    flag);
+		}
+	}
+
+	kvm_vm_release(vm);
+}
+
+static void test_gmem_flag_validity(void)
+{
+	uint64_t non_coco_vm_valid_flags = 0;
+
+	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
+		non_coco_vm_valid_flags = GUEST_MEMFD_FLAG_SUPPORT_SHARED;
+
+	test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, non_coco_vm_valid_flags);
+
+#ifdef __x86_64__
+	test_vm_type_gmem_flag_validity(KVM_X86_SW_PROTECTED_VM, non_coco_vm_valid_flags);
+	test_vm_type_gmem_flag_validity(KVM_X86_SEV_VM, 0);
+	test_vm_type_gmem_flag_validity(KVM_X86_SEV_ES_VM, 0);
+	test_vm_type_gmem_flag_validity(KVM_X86_SNP_VM, 0);
+	test_vm_type_gmem_flag_validity(KVM_X86_TDX_VM, 0);
+#endif
+}
+
+int main(int argc, char *argv[])
+{
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+
+	test_gmem_flag_validity();
+
+	test_with_type(VM_TYPE_DEFAULT, 0, false);
+	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
+		test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_SUPPORT_SHARED,
+			       true);
+	}
+
+#ifdef __x86_64__
+	test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
+	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
+		test_with_type(KVM_X86_SW_PROTECTED_VM,
+			       GUEST_MEMFD_FLAG_SUPPORT_SHARED, true);
+	}
+#endif
 }
-- 
2.49.0.1266.g31b7d2e469-goog



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs
  2025-06-05 15:37 ` [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs Fuad Tabba
@ 2025-06-05 15:49   ` David Hildenbrand
  2025-06-05 16:11     ` Fuad Tabba
  0 siblings, 1 reply; 56+ messages in thread
From: David Hildenbrand @ 2025-06-05 15:49 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 05.06.25 17:37, Fuad Tabba wrote:
> Define the architecture-specific macro to enable shared memory support
> in guest_memfd for relevant software-only VM types, specifically
> KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.
> 
> Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
> enabled.
> 
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h | 10 ++++++++++
>   arch/x86/kvm/Kconfig            |  1 +
>   arch/x86/kvm/x86.c              |  3 ++-
>   3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 709cc2a7ba66..ce9ad4cd93c5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>   
>   #ifdef CONFIG_KVM_GMEM
>   #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> +
> +/*
> + * CoCo VMs with hardware support that use guest_memfd only for backing private
> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> + */
> +#define kvm_arch_supports_gmem_shared_mem(kvm)			\
> +	(IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&			\
> +	 ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||		\
> +	  (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
>   #else
>   #define kvm_arch_supports_gmem(kvm) false
> +#define kvm_arch_supports_gmem_shared_mem(kvm) false
>   #endif
>   
>   #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index b37258253543..fdf24b50af9d 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -47,6 +47,7 @@ config KVM_X86
>   	select KVM_GENERIC_HARDWARE_ENABLING
>   	select KVM_GENERIC_PRE_FAULT_MEMORY
>   	select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
> +	select KVM_GMEM_SHARED_MEM if KVM_SW_PROTECTED_VM
>   	select KVM_WERROR if WERROR

Is $subject and this still true, given that it's now also supported for 
KVM_X86_DEFAULT_VM?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs
  2025-06-05 15:49   ` David Hildenbrand
@ 2025-06-05 16:11     ` Fuad Tabba
  2025-06-05 17:35       ` David Hildenbrand
  0 siblings, 1 reply; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 16:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, 5 Jun 2025 at 16:49, David Hildenbrand <david@redhat.com> wrote:
>
> On 05.06.25 17:37, Fuad Tabba wrote:
> > Define the architecture-specific macro to enable shared memory support
> > in guest_memfd for relevant software-only VM types, specifically
> > KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.
> >
> > Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
> > enabled.
> >
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   arch/x86/include/asm/kvm_host.h | 10 ++++++++++
> >   arch/x86/kvm/Kconfig            |  1 +
> >   arch/x86/kvm/x86.c              |  3 ++-
> >   3 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 709cc2a7ba66..ce9ad4cd93c5 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> >
> >   #ifdef CONFIG_KVM_GMEM
> >   #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> > +
> > +/*
> > + * CoCo VMs with hardware support that use guest_memfd only for backing private
> > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> > + */
> > +#define kvm_arch_supports_gmem_shared_mem(kvm)                       \
> > +     (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&                      \
> > +      ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||             \
> > +       (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
> >   #else
> >   #define kvm_arch_supports_gmem(kvm) false
> > +#define kvm_arch_supports_gmem_shared_mem(kvm) false
> >   #endif
> >
> >   #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> > index b37258253543..fdf24b50af9d 100644
> > --- a/arch/x86/kvm/Kconfig
> > +++ b/arch/x86/kvm/Kconfig
> > @@ -47,6 +47,7 @@ config KVM_X86
> >       select KVM_GENERIC_HARDWARE_ENABLING
> >       select KVM_GENERIC_PRE_FAULT_MEMORY
> >       select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
> > +     select KVM_GMEM_SHARED_MEM if KVM_SW_PROTECTED_VM
> >       select KVM_WERROR if WERROR
>
> Is $subject and this still true, given that it's now also supported for
> KVM_X86_DEFAULT_VM?

True, just not the whole truth :)

I guess a better one would be, for Software VMs (remove protected)?

/fuad
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-06-05 15:37 ` [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
@ 2025-06-05 17:21   ` James Houghton
  2025-06-06  7:31     ` Fuad Tabba
  2025-06-09  4:08   ` Gavin Shan
  1 sibling, 1 reply; 56+ messages in thread
From: James Houghton @ 2025-06-05 17:21 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jun 5, 2025 at 8:38 AM Fuad Tabba <tabba@google.com> wrote:
>
> Add arm64 support for handling guest page faults on guest_memfd backed
> memslots. Until guest_memfd supports huge pages, the fault granule is
> restricted to PAGE_SIZE.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Hi Fuad, sorry for not getting back to you on v10. I like this patch
much better than the v9 version, thank you! Some small notes below.

> ---
>  arch/arm64/kvm/mmu.c | 93 ++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 90 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ce80be116a30..f14925fe6144 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1508,6 +1508,89 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
>         *prot |= kvm_encode_nested_level(nested);
>  }
>
> +#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
> +
> +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +                     struct kvm_s2_trans *nested,
> +                     struct kvm_memory_slot *memslot, bool is_perm)
> +{
> +       bool logging, write_fault, exec_fault, writable;
> +       enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
> +       enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> +       struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> +       struct page *page;
> +       struct kvm *kvm = vcpu->kvm;
> +       void *memcache;
> +       kvm_pfn_t pfn;
> +       gfn_t gfn;
> +       int ret;
> +
> +       ret = prepare_mmu_memcache(vcpu, !is_perm, &memcache);
> +       if (ret)
> +               return ret;
> +
> +       if (nested)
> +               gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
> +       else
> +               gfn = fault_ipa >> PAGE_SHIFT;
> +
> +       logging = memslot_is_logging(memslot);

AFAICT, `logging` will always be `false` for now, so we can simplify
this function quite a bit. And IMHO, it *should* be simplified, as it
cannot be tested.

> +       write_fault = kvm_is_write_fault(vcpu);
> +       exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
> +
> +       if (write_fault && exec_fault) {
> +               kvm_err("Simultaneous write and execution fault\n");
> +               return -EFAULT;
> +       }
> +
> +       if (is_perm && !write_fault && !exec_fault) {
> +               kvm_err("Unexpected L2 read permission error\n");
> +               return -EFAULT;
> +       }

I think, ideally, these above checks should be put into a separate
function and shared with user_mem_abort(). (The VM_BUG_ON(write_fault
&& exec_fault) that user_mem_abort() does seems fine to me, I don't see a
real need to change it to -EFAULT.)

> +
> +       ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
> +       if (ret) {
> +               kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
> +                                             write_fault, exec_fault, false);
> +               return ret;
> +       }
> +
> +       writable = !(memslot->flags & KVM_MEM_READONLY) &&
> +                  (!logging || write_fault);
> +
> +       if (nested)
> +               adjust_nested_fault_perms(nested, &prot, &writable);
> +
> +       if (writable)
> +               prot |= KVM_PGTABLE_PROT_W;
> +
> +       if (exec_fault ||
> +           (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
> +            (!nested || kvm_s2_trans_executable(nested))))
> +               prot |= KVM_PGTABLE_PROT_X;
> +
> +       kvm_fault_lock(kvm);
> +       if (is_perm) {
> +               /*
> +                * Drop the SW bits in favour of those stored in the
> +                * PTE, which will be preserved.
> +                */
> +               prot &= ~KVM_NV_GUEST_MAP_SZ;
> +               ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags);

I think you should drop this `is_perm` path, as it is an optimization
for dirty logging, which we don't currently do. :)

When we want to add dirty logging support, we probably ought to move
this mapping code (the lines kvm_fault_lock() and kvm_fault_unlock())
into its own function and share it with user_mem_abort().

> +       } else {
> +               ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
> +                                            __pfn_to_phys(pfn), prot,
> +                                            memcache, flags);
> +       }
> +       kvm_release_faultin_page(kvm, page, !!ret, writable);
> +       kvm_fault_unlock(kvm);
> +
> +       if (writable && !ret)
> +               mark_page_dirty_in_slot(kvm, memslot, gfn);
> +
> +       return ret != -EAGAIN ? ret : 0;
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                           struct kvm_s2_trans *nested,
>                           struct kvm_memory_slot *memslot, unsigned long hva,
> @@ -1532,7 +1615,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>         struct kvm_pgtable *pgt;
>         struct page *page;
> -       enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
> +       enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
>
>         if (fault_is_perm)
>                 fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
> @@ -1959,8 +2042,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>                 goto out_unlock;
>         }
>
> -       ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> -                            esr_fsc_is_permission_fault(esr));
> +       if (kvm_slot_has_gmem(memslot))
> +               ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> +                                esr_fsc_is_permission_fault(esr));
> +       else
> +               ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> +                                    esr_fsc_is_permission_fault(esr));

I like this split! Thank you!


>         if (ret == 0)
>                 ret = 1;
>  out:
> --
> 2.49.0.1266.g31b7d2e469-goog
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory
  2025-06-05 15:37 ` [PATCH v11 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
@ 2025-06-05 17:26   ` James Houghton
  2025-06-09  0:29   ` Gavin Shan
  1 sibling, 0 replies; 56+ messages in thread
From: James Houghton @ 2025-06-05 17:26 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jun 5, 2025 at 8:38 AM Fuad Tabba <tabba@google.com> wrote:
>
> Enable the host mapping of guest_memfd-backed memory on arm64.
>
> This applies to all current arm64 VM types that support guest_memfd.
> Future VM types can restrict this behavior via the
> kvm_arch_gmem_supports_shared_mem() hook if needed.
>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Feel free to add:

Reviewed-by: James Houghton <jthoughton@google.com>

> ---
>  arch/arm64/include/asm/kvm_host.h | 5 +++++
>  arch/arm64/kvm/Kconfig            | 1 +
>  arch/arm64/kvm/mmu.c              | 7 +++++++
>  3 files changed, 13 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 08ba91e6fb03..8add94929711 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1593,4 +1593,9 @@ static inline bool kvm_arch_has_irq_bypass(void)
>         return true;
>  }
>
> +#ifdef CONFIG_KVM_GMEM
> +#define kvm_arch_supports_gmem(kvm) true
> +#define kvm_arch_supports_gmem_shared_mem(kvm) IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
> +#endif

Thanks!

> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 096e45acadb2..8c1e1964b46a 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -38,6 +38,7 @@ menuconfig KVM
>         select HAVE_KVM_VCPU_RUN_PID_CHANGE
>         select SCHED_INFO
>         select GUEST_PERF_EVENTS if PERF_EVENTS
> +       select KVM_GMEM_SHARED_MEM
>         help
>           Support hosting virtualized guest machines.
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index f14925fe6144..19aca1442bbf 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -2281,6 +2281,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>         if ((new->base_gfn + new->npages) > (kvm_phys_size(&kvm->arch.mmu) >> PAGE_SHIFT))
>                 return -EFAULT;
>
> +       /*
> +        * Only support guest_memfd backed memslots with shared memory, since
> +        * there aren't any CoCo VMs that support only private memory on arm64.
> +        */
> +       if (kvm_slot_has_gmem(new) && !kvm_gmem_memslot_supports_shared(new))
> +               return -EINVAL;
> +
>         hva = new->userspace_addr;
>         reg_end = hva + (new->npages << PAGE_SHIFT);
>
> --
> 2.49.0.1266.g31b7d2e469-goog
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs
  2025-06-05 16:11     ` Fuad Tabba
@ 2025-06-05 17:35       ` David Hildenbrand
  2025-06-05 17:43         ` Fuad Tabba
  0 siblings, 1 reply; 56+ messages in thread
From: David Hildenbrand @ 2025-06-05 17:35 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 05.06.25 18:11, Fuad Tabba wrote:
> On Thu, 5 Jun 2025 at 16:49, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 05.06.25 17:37, Fuad Tabba wrote:
>>> Define the architecture-specific macro to enable shared memory support
>>> in guest_memfd for relevant software-only VM types, specifically
>>> KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.
>>>
>>> Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
>>> enabled.
>>>
>>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>>    arch/x86/include/asm/kvm_host.h | 10 ++++++++++
>>>    arch/x86/kvm/Kconfig            |  1 +
>>>    arch/x86/kvm/x86.c              |  3 ++-
>>>    3 files changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 709cc2a7ba66..ce9ad4cd93c5 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>>>
>>>    #ifdef CONFIG_KVM_GMEM
>>>    #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
>>> +
>>> +/*
>>> + * CoCo VMs with hardware support that use guest_memfd only for backing private
>>> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
>>> + */
>>> +#define kvm_arch_supports_gmem_shared_mem(kvm)                       \
>>> +     (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&                      \
>>> +      ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||             \
>>> +       (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
>>>    #else
>>>    #define kvm_arch_supports_gmem(kvm) false
>>> +#define kvm_arch_supports_gmem_shared_mem(kvm) false
>>>    #endif
>>>
>>>    #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
>>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
>>> index b37258253543..fdf24b50af9d 100644
>>> --- a/arch/x86/kvm/Kconfig
>>> +++ b/arch/x86/kvm/Kconfig
>>> @@ -47,6 +47,7 @@ config KVM_X86
>>>        select KVM_GENERIC_HARDWARE_ENABLING
>>>        select KVM_GENERIC_PRE_FAULT_MEMORY
>>>        select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
>>> +     select KVM_GMEM_SHARED_MEM if KVM_SW_PROTECTED_VM
>>>        select KVM_WERROR if WERROR
>>
>> Is $subject and this still true, given that it's now also supported for
>> KVM_X86_DEFAULT_VM?
> 
> True, just not the whole truth :)
> 
> I guess a better one would be, for Software VMs (remove protected)?

Now I am curious, what is a Hardware VM? :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs
  2025-06-05 17:35       ` David Hildenbrand
@ 2025-06-05 17:43         ` Fuad Tabba
  2025-06-05 17:45           ` David Hildenbrand
  0 siblings, 1 reply; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 17:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, 5 Jun 2025 at 18:35, David Hildenbrand <david@redhat.com> wrote:
>
> On 05.06.25 18:11, Fuad Tabba wrote:
> > On Thu, 5 Jun 2025 at 16:49, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 05.06.25 17:37, Fuad Tabba wrote:
> >>> Define the architecture-specific macro to enable shared memory support
> >>> in guest_memfd for relevant software-only VM types, specifically
> >>> KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.
> >>>
> >>> Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
> >>> enabled.
> >>>
> >>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> >>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> >>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>> ---
> >>>    arch/x86/include/asm/kvm_host.h | 10 ++++++++++
> >>>    arch/x86/kvm/Kconfig            |  1 +
> >>>    arch/x86/kvm/x86.c              |  3 ++-
> >>>    3 files changed, 13 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>> index 709cc2a7ba66..ce9ad4cd93c5 100644
> >>> --- a/arch/x86/include/asm/kvm_host.h
> >>> +++ b/arch/x86/include/asm/kvm_host.h
> >>> @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> >>>
> >>>    #ifdef CONFIG_KVM_GMEM
> >>>    #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> >>> +
> >>> +/*
> >>> + * CoCo VMs with hardware support that use guest_memfd only for backing private
> >>> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> >>> + */
> >>> +#define kvm_arch_supports_gmem_shared_mem(kvm)                       \
> >>> +     (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&                      \
> >>> +      ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||             \
> >>> +       (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
> >>>    #else
> >>>    #define kvm_arch_supports_gmem(kvm) false
> >>> +#define kvm_arch_supports_gmem_shared_mem(kvm) false
> >>>    #endif
> >>>
> >>>    #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> >>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> >>> index b37258253543..fdf24b50af9d 100644
> >>> --- a/arch/x86/kvm/Kconfig
> >>> +++ b/arch/x86/kvm/Kconfig
> >>> @@ -47,6 +47,7 @@ config KVM_X86
> >>>        select KVM_GENERIC_HARDWARE_ENABLING
> >>>        select KVM_GENERIC_PRE_FAULT_MEMORY
> >>>        select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
> >>> +     select KVM_GMEM_SHARED_MEM if KVM_SW_PROTECTED_VM
> >>>        select KVM_WERROR if WERROR
> >>
> >> Is $subject and this still true, given that it's now also supported for
> >> KVM_X86_DEFAULT_VM?
> >
> > True, just not the whole truth :)
> >
> > I guess a better one would be, for Software VMs (remove protected)?
>
> Now I am curious, what is a Hardware VM? :)

The opposite of a software one! ;) i.e., hardware-supported CoCo,
e.g., TDX, CCA...

Cheers,
/fuad
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs
  2025-06-05 17:43         ` Fuad Tabba
@ 2025-06-05 17:45           ` David Hildenbrand
  2025-06-05 18:29             ` Fuad Tabba
  0 siblings, 1 reply; 56+ messages in thread
From: David Hildenbrand @ 2025-06-05 17:45 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 05.06.25 19:43, Fuad Tabba wrote:
> On Thu, 5 Jun 2025 at 18:35, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 05.06.25 18:11, Fuad Tabba wrote:
>>> On Thu, 5 Jun 2025 at 16:49, David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 05.06.25 17:37, Fuad Tabba wrote:
>>>>> Define the architecture-specific macro to enable shared memory support
>>>>> in guest_memfd for relevant software-only VM types, specifically
>>>>> KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.
>>>>>
>>>>> Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
>>>>> enabled.
>>>>>
>>>>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
>>>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>>>> ---
>>>>>     arch/x86/include/asm/kvm_host.h | 10 ++++++++++
>>>>>     arch/x86/kvm/Kconfig            |  1 +
>>>>>     arch/x86/kvm/x86.c              |  3 ++-
>>>>>     3 files changed, 13 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>>> index 709cc2a7ba66..ce9ad4cd93c5 100644
>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>> @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>>>>>
>>>>>     #ifdef CONFIG_KVM_GMEM
>>>>>     #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
>>>>> +
>>>>> +/*
>>>>> + * CoCo VMs with hardware support that use guest_memfd only for backing private
>>>>> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
>>>>> + */
>>>>> +#define kvm_arch_supports_gmem_shared_mem(kvm)                       \
>>>>> +     (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&                      \
>>>>> +      ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||             \
>>>>> +       (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
>>>>>     #else
>>>>>     #define kvm_arch_supports_gmem(kvm) false
>>>>> +#define kvm_arch_supports_gmem_shared_mem(kvm) false
>>>>>     #endif
>>>>>
>>>>>     #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
>>>>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
>>>>> index b37258253543..fdf24b50af9d 100644
>>>>> --- a/arch/x86/kvm/Kconfig
>>>>> +++ b/arch/x86/kvm/Kconfig
>>>>> @@ -47,6 +47,7 @@ config KVM_X86
>>>>>         select KVM_GENERIC_HARDWARE_ENABLING
>>>>>         select KVM_GENERIC_PRE_FAULT_MEMORY
>>>>>         select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
>>>>> +     select KVM_GMEM_SHARED_MEM if KVM_SW_PROTECTED_VM
>>>>>         select KVM_WERROR if WERROR
>>>>
>>>> Is $subject and this still true, given that it's now also supported for
>>>> KVM_X86_DEFAULT_VM?
>>>
>>> True, just not the whole truth :)
>>>
>>> I guess a better one would be, for Software VMs (remove protected)?
>>
>> Now I am curious, what is a Hardware VM? :)
> 
> The opposite of a software one! ;) i.e., hardware-supported CoCo,
> e.g., TDX, CCA...

So, you mean a sofware VM is ... just an ordinary VM? :P

"KVM: x86: Enable guest_memfd shared memory for ordinary (non-CoCo) VMs" ?

But, whatever you prefer :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs
  2025-06-05 17:45           ` David Hildenbrand
@ 2025-06-05 18:29             ` Fuad Tabba
  0 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-05 18:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, 5 Jun 2025 at 18:45, David Hildenbrand <david@redhat.com> wrote:
>
> On 05.06.25 19:43, Fuad Tabba wrote:
> > On Thu, 5 Jun 2025 at 18:35, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 05.06.25 18:11, Fuad Tabba wrote:
> >>> On Thu, 5 Jun 2025 at 16:49, David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 05.06.25 17:37, Fuad Tabba wrote:
> >>>>> Define the architecture-specific macro to enable shared memory support
> >>>>> in guest_memfd for relevant software-only VM types, specifically
> >>>>> KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.
> >>>>>
> >>>>> Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
> >>>>> enabled.
> >>>>>
> >>>>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> >>>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> >>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>>>> ---
> >>>>>     arch/x86/include/asm/kvm_host.h | 10 ++++++++++
> >>>>>     arch/x86/kvm/Kconfig            |  1 +
> >>>>>     arch/x86/kvm/x86.c              |  3 ++-
> >>>>>     3 files changed, 13 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>>>> index 709cc2a7ba66..ce9ad4cd93c5 100644
> >>>>> --- a/arch/x86/include/asm/kvm_host.h
> >>>>> +++ b/arch/x86/include/asm/kvm_host.h
> >>>>> @@ -2255,8 +2255,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> >>>>>
> >>>>>     #ifdef CONFIG_KVM_GMEM
> >>>>>     #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> >>>>> +
> >>>>> +/*
> >>>>> + * CoCo VMs with hardware support that use guest_memfd only for backing private
> >>>>> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> >>>>> + */
> >>>>> +#define kvm_arch_supports_gmem_shared_mem(kvm)                       \
> >>>>> +     (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&                      \
> >>>>> +      ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||             \
> >>>>> +       (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
> >>>>>     #else
> >>>>>     #define kvm_arch_supports_gmem(kvm) false
> >>>>> +#define kvm_arch_supports_gmem_shared_mem(kvm) false
> >>>>>     #endif
> >>>>>
> >>>>>     #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> >>>>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> >>>>> index b37258253543..fdf24b50af9d 100644
> >>>>> --- a/arch/x86/kvm/Kconfig
> >>>>> +++ b/arch/x86/kvm/Kconfig
> >>>>> @@ -47,6 +47,7 @@ config KVM_X86
> >>>>>         select KVM_GENERIC_HARDWARE_ENABLING
> >>>>>         select KVM_GENERIC_PRE_FAULT_MEMORY
> >>>>>         select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
> >>>>> +     select KVM_GMEM_SHARED_MEM if KVM_SW_PROTECTED_VM
> >>>>>         select KVM_WERROR if WERROR
> >>>>
> >>>> Is $subject and this still true, given that it's now also supported for
> >>>> KVM_X86_DEFAULT_VM?
> >>>
> >>> True, just not the whole truth :)
> >>>
> >>> I guess a better one would be, for Software VMs (remove protected)?
> >>
> >> Now I am curious, what is a Hardware VM? :)
> >
> > The opposite of a software one! ;) i.e., hardware-supported CoCo,
> > e.g., TDX, CCA...
>
> So, you mean a sofware VM is ... just an ordinary VM? :P
>
> "KVM: x86: Enable guest_memfd shared memory for ordinary (non-CoCo) VMs" ?
>
> But, whatever you prefer :)

This sounds better. I was thrown off by the KVM_SW_PROTECTED_VM type :)

/fuad

> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2025-06-05 15:38 ` [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
@ 2025-06-05 22:07   ` James Houghton
  2025-06-05 22:12     ` Sean Christopherson
  2025-06-06  8:14     ` Fuad Tabba
  2025-06-08 23:43   ` Gavin Shan
  1 sibling, 2 replies; 56+ messages in thread
From: James Houghton @ 2025-06-05 22:07 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jun 5, 2025 at 8:38 AM Fuad Tabba <tabba@google.com> wrote:
>
> Expand the guest_memfd selftests to include testing mapping guest
> memory for VM types that support it.
>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Feel free to add:

Reviewed-by: James Houghton <jthoughton@google.com>

> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 201 ++++++++++++++++--
>  1 file changed, 180 insertions(+), 21 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 341ba616cf55..1612d3adcd0d 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -13,6 +13,8 @@
>
>  #include <linux/bitmap.h>
>  #include <linux/falloc.h>
> +#include <setjmp.h>
> +#include <signal.h>
>  #include <sys/mman.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> @@ -34,12 +36,83 @@ static void test_file_read_write(int fd)
>                     "pwrite on a guest_mem fd should fail");
>  }
>
> -static void test_mmap(int fd, size_t page_size)
> +static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
> +{
> +       const char val = 0xaa;
> +       char *mem;

This must be `volatile char *` to ensure that the compiler doesn't
elide the accesses you have written.

> +       size_t i;
> +       int ret;
> +
> +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> +       TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
> +
> +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +       TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
> +
> +       memset(mem, val, total_size);

Now unfortunately, `memset` and `munmap` will complain about the
volatile qualification. So...

memset((char *)mem, val, total_size);

Eh... wish they just wouldn't complain, but this is a small price to
pay for correctness. :)

> +       for (i = 0; i < total_size; i++)
> +               TEST_ASSERT_EQ(mem[i], val);

The compiler is allowed to[1] elide the read of `mem[i]` and just
assume that it is `val`.

[1]: https://godbolt.org/z/Wora54bP6

Feel free to add `volatile` to that snippet to see how the code changes.

> +
> +       ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
> +                       page_size);
> +       TEST_ASSERT(!ret, "fallocate the first page should succeed.");
> +
> +       for (i = 0; i < page_size; i++)
> +               TEST_ASSERT_EQ(mem[i], 0x00);
> +       for (; i < total_size; i++)
> +               TEST_ASSERT_EQ(mem[i], val);
> +
> +       memset(mem, val, page_size);
> +       for (i = 0; i < total_size; i++)
> +               TEST_ASSERT_EQ(mem[i], val);
> +
> +       ret = munmap(mem, total_size);
> +       TEST_ASSERT(!ret, "munmap() should succeed.");
> +}
> +
> +static sigjmp_buf jmpbuf;
> +void fault_sigbus_handler(int signum)
> +{
> +       siglongjmp(jmpbuf, 1);
> +}
> +
> +static void test_fault_overflow(int fd, size_t page_size, size_t total_size)
> +{
> +       struct sigaction sa_old, sa_new = {
> +               .sa_handler = fault_sigbus_handler,
> +       };
> +       size_t map_size = total_size * 4;
> +       const char val = 0xaa;
> +       char *mem;

`volatile` here as well.

> +       size_t i;
> +       int ret;
> +
> +       mem = mmap(NULL, map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +       TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
> +
> +       sigaction(SIGBUS, &sa_new, &sa_old);
> +       if (sigsetjmp(jmpbuf, 1) == 0) {
> +               memset(mem, 0xaa, map_size);
> +               TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
> +       }
> +       sigaction(SIGBUS, &sa_old, NULL);
> +
> +       for (i = 0; i < total_size; i++)
> +               TEST_ASSERT_EQ(mem[i], val);
> +
> +       ret = munmap(mem, map_size);
> +       TEST_ASSERT(!ret, "munmap() should succeed.");
> +}
> +
> +static void test_mmap_not_supported(int fd, size_t page_size, size_t total_size)
>  {
>         char *mem;
>
>         mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>         TEST_ASSERT_EQ(mem, MAP_FAILED);
> +
> +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +       TEST_ASSERT_EQ(mem, MAP_FAILED);
>  }
>
>  static void test_file_size(int fd, size_t page_size, size_t total_size)
> @@ -120,26 +193,19 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
>         }
>  }
>
> -static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
> +static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
> +                                                 uint64_t guest_memfd_flags,
> +                                                 size_t page_size)
>  {
> -       size_t page_size = getpagesize();
> -       uint64_t flag;
>         size_t size;
>         int fd;
>
>         for (size = 1; size < page_size; size++) {
> -               fd = __vm_create_guest_memfd(vm, size, 0);
> -               TEST_ASSERT(fd == -1 && errno == EINVAL,
> +               fd = __vm_create_guest_memfd(vm, size, guest_memfd_flags);
> +               TEST_ASSERT(fd < 0 && errno == EINVAL,
>                             "guest_memfd() with non-page-aligned page size '0x%lx' should fail with EINVAL",
>                             size);
>         }
> -
> -       for (flag = BIT(0); flag; flag <<= 1) {
> -               fd = __vm_create_guest_memfd(vm, page_size, flag);
> -               TEST_ASSERT(fd == -1 && errno == EINVAL,
> -                           "guest_memfd() with flag '0x%lx' should fail with EINVAL",
> -                           flag);
> -       }
>  }
>
>  static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> @@ -171,30 +237,123 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
>         close(fd1);
>  }
>
> -int main(int argc, char *argv[])
> +static bool check_vm_type(unsigned long vm_type)
>  {
> -       size_t page_size;
> +       /*
> +        * Not all architectures support KVM_CAP_VM_TYPES. However, those that
> +        * support guest_memfd have that support for the default VM type.
> +        */
> +       if (vm_type == VM_TYPE_DEFAULT)
> +               return true;
> +
> +       return kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type);
> +}
> +
> +static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
> +                          bool expect_mmap_allowed)
> +{
> +       struct kvm_vm *vm;
>         size_t total_size;
> +       size_t page_size;
>         int fd;
> -       struct kvm_vm *vm;
>
> -       TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> +       if (!check_vm_type(vm_type))
> +               return;
>
>         page_size = getpagesize();
>         total_size = page_size * 4;
>
> -       vm = vm_create_barebones();
> +       vm = vm_create_barebones_type(vm_type);
>
> -       test_create_guest_memfd_invalid(vm);
>         test_create_guest_memfd_multiple(vm);
> +       test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size);
>
> -       fd = vm_create_guest_memfd(vm, total_size, 0);
> +       fd = vm_create_guest_memfd(vm, total_size, guest_memfd_flags);
>
>         test_file_read_write(fd);
> -       test_mmap(fd, page_size);
> +
> +       if (expect_mmap_allowed) {
> +               test_mmap_supported(fd, page_size, total_size);
> +               test_fault_overflow(fd, page_size, total_size);
> +
> +       } else {
> +               test_mmap_not_supported(fd, page_size, total_size);
> +       }
> +
>         test_file_size(fd, page_size, total_size);
>         test_fallocate(fd, page_size, total_size);
>         test_invalid_punch_hole(fd, page_size, total_size);
>
>         close(fd);
> +       kvm_vm_release(vm);

I think kvm_vm_free() is probably more appropriate?

> +}
> +
> +static void test_vm_type_gmem_flag_validity(unsigned long vm_type,
> +                                           uint64_t expected_valid_flags)
> +{
> +       size_t page_size = getpagesize();
> +       struct kvm_vm *vm;
> +       uint64_t flag = 0;
> +       int fd;
> +
> +       if (!check_vm_type(vm_type))
> +               return;
> +
> +       vm = vm_create_barebones_type(vm_type);
> +
> +       for (flag = BIT(0); flag; flag <<= 1) {
> +               fd = __vm_create_guest_memfd(vm, page_size, flag);
> +
> +               if (flag & expected_valid_flags) {
> +                       TEST_ASSERT(fd >= 0,
> +                                   "guest_memfd() with flag '0x%lx' should be valid",
> +                                   flag);
> +                       close(fd);
> +               } else {
> +                       TEST_ASSERT(fd < 0 && errno == EINVAL,
> +                                   "guest_memfd() with flag '0x%lx' should fail with EINVAL",
> +                                   flag);
> +               }
> +       }
> +
> +       kvm_vm_release(vm);

Same here.

> +}
> +
> +static void test_gmem_flag_validity(void)
> +{
> +       uint64_t non_coco_vm_valid_flags = 0;
> +
> +       if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
> +               non_coco_vm_valid_flags = GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +
> +       test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, non_coco_vm_valid_flags);
> +
> +#ifdef __x86_64__
> +       test_vm_type_gmem_flag_validity(KVM_X86_SW_PROTECTED_VM, non_coco_vm_valid_flags);
> +       test_vm_type_gmem_flag_validity(KVM_X86_SEV_VM, 0);
> +       test_vm_type_gmem_flag_validity(KVM_X86_SEV_ES_VM, 0);
> +       test_vm_type_gmem_flag_validity(KVM_X86_SNP_VM, 0);
> +       test_vm_type_gmem_flag_validity(KVM_X86_TDX_VM, 0);
> +#endif
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +       TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> +
> +       test_gmem_flag_validity();
> +
> +       test_with_type(VM_TYPE_DEFAULT, 0, false);
> +       if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
> +               test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_SUPPORT_SHARED,
> +                              true);
> +       }
> +
> +#ifdef __x86_64__
> +       test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
> +       if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
> +               test_with_type(KVM_X86_SW_PROTECTED_VM,
> +                              GUEST_MEMFD_FLAG_SUPPORT_SHARED, true);
> +       }
> +#endif
>  }
> --
> 2.49.0.1266.g31b7d2e469-goog
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2025-06-05 22:07   ` James Houghton
@ 2025-06-05 22:12     ` Sean Christopherson
  2025-06-05 22:17       ` James Houghton
  2025-06-06  8:14     ` Fuad Tabba
  1 sibling, 1 reply; 56+ messages in thread
From: Sean Christopherson @ 2025-06-05 22:12 UTC (permalink / raw)
  To: James Houghton
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jun 05, 2025, James Houghton wrote:
> On Thu, Jun 5, 2025 at 8:38 AM Fuad Tabba <tabba@google.com> wrote:
> > @@ -34,12 +36,83 @@ static void test_file_read_write(int fd)
> >                     "pwrite on a guest_mem fd should fail");
> >  }
> >
> > -static void test_mmap(int fd, size_t page_size)
> > +static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
> > +{
> > +       const char val = 0xaa;
> > +       char *mem;
> 
> This must be `volatile char *` to ensure that the compiler doesn't
> elide the accesses you have written.
> 
> > +       size_t i;
> > +       int ret;
> > +
> > +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> > +       TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
> > +
> > +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > +       TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
> > +
> > +       memset(mem, val, total_size);
> 
> Now unfortunately, `memset` and `munmap` will complain about the
> volatile qualification. So...
> 
> memset((char *)mem, val, total_size);
> 
> Eh... wish they just wouldn't complain, but this is a small price to
> pay for correctness. :)
> 
> > +       for (i = 0; i < total_size; i++)
> > +               TEST_ASSERT_EQ(mem[i], val);
> 
> The compiler is allowed to[1] elide the read of `mem[i]` and just
> assume that it is `val`.

I don't think "volatile" is needed.  Won't READ_ONCE(mem[i]) do the trick?  That
in turn will force the compiler to emit the stores as well.

> [1]: https://godbolt.org/z/Wora54bP6
> 
> Feel free to add `volatile` to that snippet to see how the code changes.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2025-06-05 22:12     ` Sean Christopherson
@ 2025-06-05 22:17       ` James Houghton
  0 siblings, 0 replies; 56+ messages in thread
From: James Houghton @ 2025-06-05 22:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jun 5, 2025 at 3:12 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Jun 05, 2025, James Houghton wrote:
> > On Thu, Jun 5, 2025 at 8:38 AM Fuad Tabba <tabba@google.com> wrote:
> > > @@ -34,12 +36,83 @@ static void test_file_read_write(int fd)
> > >                     "pwrite on a guest_mem fd should fail");
> > >  }
> > >
> > > -static void test_mmap(int fd, size_t page_size)
> > > +static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
> > > +{
> > > +       const char val = 0xaa;
> > > +       char *mem;
> >
> > This must be `volatile char *` to ensure that the compiler doesn't
> > elide the accesses you have written.
> >
> > > +       size_t i;
> > > +       int ret;
> > > +
> > > +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> > > +       TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
> > > +
> > > +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > > +       TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
> > > +
> > > +       memset(mem, val, total_size);
> >
> > Now unfortunately, `memset` and `munmap` will complain about the
> > volatile qualification. So...
> >
> > memset((char *)mem, val, total_size);
> >
> > Eh... wish they just wouldn't complain, but this is a small price to
> > pay for correctness. :)
> >
> > > +       for (i = 0; i < total_size; i++)
> > > +               TEST_ASSERT_EQ(mem[i], val);
> >
> > The compiler is allowed to[1] elide the read of `mem[i]` and just
> > assume that it is `val`.
>
> I don't think "volatile" is needed.  Won't READ_ONCE(mem[i]) do the trick?  That
> in turn will force the compiler to emit the stores as well.

Yeah `volatile` is only needed on the reads. READ_ONCE() implies a
`volatile` read, so if you want to write it that way, that's fine too.

I prefer my original suggestion though; it's less likely for there to
be a bug. :)

> > [1]: https://godbolt.org/z/Wora54bP6
> >
> > Feel free to add `volatile` to that snippet to see how the code changes.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-06-05 17:21   ` James Houghton
@ 2025-06-06  7:31     ` Fuad Tabba
  2025-06-06  7:39       ` David Hildenbrand
  0 siblings, 1 reply; 56+ messages in thread
From: Fuad Tabba @ 2025-06-06  7:31 UTC (permalink / raw)
  To: James Houghton
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
	pankaj.gupta, ira.weiny

Hi James,

On Thu, 5 Jun 2025 at 18:21, James Houghton <jthoughton@google.com> wrote:
>
> On Thu, Jun 5, 2025 at 8:38 AM Fuad Tabba <tabba@google.com> wrote:
> >
> > Add arm64 support for handling guest page faults on guest_memfd backed
> > memslots. Until guest_memfd supports huge pages, the fault granule is
> > restricted to PAGE_SIZE.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
>
> Hi Fuad, sorry for not getting back to you on v10. I like this patch
> much better than the v9 version, thank you! Some small notes below.
>
> > ---
> >  arch/arm64/kvm/mmu.c | 93 ++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 90 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index ce80be116a30..f14925fe6144 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1508,6 +1508,89 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
> >         *prot |= kvm_encode_nested_level(nested);
> >  }
> >
> > +#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
> > +
> > +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > +                     struct kvm_s2_trans *nested,
> > +                     struct kvm_memory_slot *memslot, bool is_perm)
> > +{
> > +       bool logging, write_fault, exec_fault, writable;
> > +       enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
> > +       enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> > +       struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> > +       struct page *page;
> > +       struct kvm *kvm = vcpu->kvm;
> > +       void *memcache;
> > +       kvm_pfn_t pfn;
> > +       gfn_t gfn;
> > +       int ret;
> > +
> > +       ret = prepare_mmu_memcache(vcpu, !is_perm, &memcache);
> > +       if (ret)
> > +               return ret;
> > +
> > +       if (nested)
> > +               gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
> > +       else
> > +               gfn = fault_ipa >> PAGE_SHIFT;
> > +
> > +       logging = memslot_is_logging(memslot);
>
> AFAICT, `logging` will always be `false` for now, so we can simplify
> this function quite a bit. And IMHO, it *should* be simplified, as it
> cannot be tested.

Ack.

> > +       write_fault = kvm_is_write_fault(vcpu);
> > +       exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
> > +
> > +       if (write_fault && exec_fault) {
> > +               kvm_err("Simultaneous write and execution fault\n");
> > +               return -EFAULT;
> > +       }
> > +
> > +       if (is_perm && !write_fault && !exec_fault) {
> > +               kvm_err("Unexpected L2 read permission error\n");
> > +               return -EFAULT;
> > +       }
>
> I think, ideally, these above checks should be put into a separate
> function and shared with user_mem_abort(). (The VM_BUG_ON(write_fault
> && exec_fault) that user_mem_abort() does seems fine to me, I don't see a
> real need to change it to -EFAULT.)

I would like to do that, however, I didn't want to change
user_mem_abort(), and regarding the VM_BUG_ON, see David's feedback to
V10:

https://lore.kernel.org/all/ed1928ce-fc6f-4aaa-9f54-126a8af12240@redhat.com/



> > +
> > +       ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
> > +       if (ret) {
> > +               kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
> > +                                             write_fault, exec_fault, false);
> > +               return ret;
> > +       }
> > +
> > +       writable = !(memslot->flags & KVM_MEM_READONLY) &&
> > +                  (!logging || write_fault);
> > +
> > +       if (nested)
> > +               adjust_nested_fault_perms(nested, &prot, &writable);
> > +
> > +       if (writable)
> > +               prot |= KVM_PGTABLE_PROT_W;
> > +
> > +       if (exec_fault ||
> > +           (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
> > +            (!nested || kvm_s2_trans_executable(nested))))
> > +               prot |= KVM_PGTABLE_PROT_X;
> > +
> > +       kvm_fault_lock(kvm);
> > +       if (is_perm) {
> > +               /*
> > +                * Drop the SW bits in favour of those stored in the
> > +                * PTE, which will be preserved.
> > +                */
> > +               prot &= ~KVM_NV_GUEST_MAP_SZ;
> > +               ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags);
>
> I think you should drop this `is_perm` path, as it is an optimization
> for dirty logging, which we don't currently do. :)
>
> When we want to add dirty logging support, we probably ought to move
> this mapping code (the lines kvm_fault_lock() and kvm_fault_unlock())
> into its own function and share it with user_mem_abort().

Ack.

> > +       } else {
> > +               ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
> > +                                            __pfn_to_phys(pfn), prot,
> > +                                            memcache, flags);
> > +       }
> > +       kvm_release_faultin_page(kvm, page, !!ret, writable);
> > +       kvm_fault_unlock(kvm);
> > +
> > +       if (writable && !ret)
> > +               mark_page_dirty_in_slot(kvm, memslot, gfn);
> > +
> > +       return ret != -EAGAIN ? ret : 0;
> > +}
> > +
> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                           struct kvm_s2_trans *nested,
> >                           struct kvm_memory_slot *memslot, unsigned long hva,
> > @@ -1532,7 +1615,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> >         struct kvm_pgtable *pgt;
> >         struct page *page;
> > -       enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
> > +       enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
> >
> >         if (fault_is_perm)
> >                 fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
> > @@ -1959,8 +2042,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >                 goto out_unlock;
> >         }
> >
> > -       ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> > -                            esr_fsc_is_permission_fault(esr));
> > +       if (kvm_slot_has_gmem(memslot))
> > +               ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> > +                                esr_fsc_is_permission_fault(esr));
> > +       else
> > +               ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> > +                                    esr_fsc_is_permission_fault(esr));
>
> I like this split! Thank you!

Thank you for your reviews!

Cheers,
/fuad

>
> >         if (ret == 0)
> >                 ret = 1;
> >  out:
> > --
> > 2.49.0.1266.g31b7d2e469-goog
> >


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-06-06  7:31     ` Fuad Tabba
@ 2025-06-06  7:39       ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2025-06-06  7:39 UTC (permalink / raw)
  To: Fuad Tabba, James Houghton
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta,
	ira.weiny

>>> +       write_fault = kvm_is_write_fault(vcpu);
>>> +       exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
>>> +
>>> +       if (write_fault && exec_fault) {
>>> +               kvm_err("Simultaneous write and execution fault\n");
>>> +               return -EFAULT;
>>> +       }
>>> +
>>> +       if (is_perm && !write_fault && !exec_fault) {
>>> +               kvm_err("Unexpected L2 read permission error\n");
>>> +               return -EFAULT;
>>> +       }
>>
>> I think, ideally, these above checks should be put into a separate
>> function and shared with user_mem_abort(). (The VM_BUG_ON(write_fault
>> && exec_fault) that user_mem_abort() does seems fine to me, I don't see a
>> real need to change it to -EFAULT.)
> 
> I would like to do that, however, I didn't want to change
> user_mem_abort(), and regarding the VM_BUG_ON, see David's feedback to
> V10:
> 
> https://lore.kernel.org/all/ed1928ce-fc6f-4aaa-9f54-126a8af12240@redhat.com/

Worth reading Linus' reply in [1], that contains a bit more history on 
BUG_ON() and how it should not be used. (VM_BUG_ON we'll now likely get 
rid of completely)

[1] https://lkml.kernel.org/r/20250604140544.688711-1-david@redhat.com

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2025-06-05 22:07   ` James Houghton
  2025-06-05 22:12     ` Sean Christopherson
@ 2025-06-06  8:14     ` Fuad Tabba
  1 sibling, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-06  8:14 UTC (permalink / raw)
  To: James Houghton
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
	pankaj.gupta, ira.weiny

Hi James,

On Thu, 5 Jun 2025 at 23:07, James Houghton <jthoughton@google.com> wrote:
>
> On Thu, Jun 5, 2025 at 8:38 AM Fuad Tabba <tabba@google.com> wrote:
> >
> > Expand the guest_memfd selftests to include testing mapping guest
> > memory for VM types that support it.
> >
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
>
> Feel free to add:
>
> Reviewed-by: James Houghton <jthoughton@google.com>

Thanks!

> > ---
> >  .../testing/selftests/kvm/guest_memfd_test.c  | 201 ++++++++++++++++--
> >  1 file changed, 180 insertions(+), 21 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> > index 341ba616cf55..1612d3adcd0d 100644
> > --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> > +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> > @@ -13,6 +13,8 @@
> >
> >  #include <linux/bitmap.h>
> >  #include <linux/falloc.h>
> > +#include <setjmp.h>
> > +#include <signal.h>
> >  #include <sys/mman.h>
> >  #include <sys/types.h>
> >  #include <sys/stat.h>
> > @@ -34,12 +36,83 @@ static void test_file_read_write(int fd)
> >                     "pwrite on a guest_mem fd should fail");
> >  }
> >
> > -static void test_mmap(int fd, size_t page_size)
> > +static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
> > +{
> > +       const char val = 0xaa;
> > +       char *mem;
>
> This must be `volatile char *` to ensure that the compiler doesn't
> elide the accesses you have written.
>
> > +       size_t i;
> > +       int ret;
> > +
> > +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> > +       TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
> > +
> > +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > +       TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
> > +
> > +       memset(mem, val, total_size);
>
> Now unfortunately, `memset` and `munmap` will complain about the
> volatile qualification. So...
>
> memset((char *)mem, val, total_size);
>
> Eh... wish they just wouldn't complain, but this is a small price to
> pay for correctness. :)
>
> > +       for (i = 0; i < total_size; i++)
> > +               TEST_ASSERT_EQ(mem[i], val);
>
> The compiler is allowed to[1] elide the read of `mem[i]` and just
> assume that it is `val`.
>
> [1]: https://godbolt.org/z/Wora54bP6
>
> Feel free to add `volatile` to that snippet to see how the code changes.

Having tried that and Sean's READ_ONCE() suggestion, I went with the
latter. Like Sean said, they're not optimised out, and avoid the need
to cast.

> > +
> > +       ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
> > +                       page_size);
> > +       TEST_ASSERT(!ret, "fallocate the first page should succeed.");
> > +
> > +       for (i = 0; i < page_size; i++)
> > +               TEST_ASSERT_EQ(mem[i], 0x00);
> > +       for (; i < total_size; i++)
> > +               TEST_ASSERT_EQ(mem[i], val);
> > +
> > +       memset(mem, val, page_size);
> > +       for (i = 0; i < total_size; i++)
> > +               TEST_ASSERT_EQ(mem[i], val);
> > +
> > +       ret = munmap(mem, total_size);
> > +       TEST_ASSERT(!ret, "munmap() should succeed.");
> > +}
> > +
> > +static sigjmp_buf jmpbuf;
> > +void fault_sigbus_handler(int signum)
> > +{
> > +       siglongjmp(jmpbuf, 1);
> > +}
> > +
> > +static void test_fault_overflow(int fd, size_t page_size, size_t total_size)
> > +{
> > +       struct sigaction sa_old, sa_new = {
> > +               .sa_handler = fault_sigbus_handler,
> > +       };
> > +       size_t map_size = total_size * 4;
> > +       const char val = 0xaa;
> > +       char *mem;
>
> `volatile` here as well.
>
> > +       size_t i;
> > +       int ret;
> > +
> > +       mem = mmap(NULL, map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > +       TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
> > +
> > +       sigaction(SIGBUS, &sa_new, &sa_old);
> > +       if (sigsetjmp(jmpbuf, 1) == 0) {
> > +               memset(mem, 0xaa, map_size);
> > +               TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
> > +       }
> > +       sigaction(SIGBUS, &sa_old, NULL);
> > +
> > +       for (i = 0; i < total_size; i++)
> > +               TEST_ASSERT_EQ(mem[i], val);
> > +
> > +       ret = munmap(mem, map_size);
> > +       TEST_ASSERT(!ret, "munmap() should succeed.");
> > +}
> > +
> > +static void test_mmap_not_supported(int fd, size_t page_size, size_t total_size)
> >  {
> >         char *mem;
> >
> >         mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> >         TEST_ASSERT_EQ(mem, MAP_FAILED);
> > +
> > +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > +       TEST_ASSERT_EQ(mem, MAP_FAILED);
> >  }
> >
> >  static void test_file_size(int fd, size_t page_size, size_t total_size)
> > @@ -120,26 +193,19 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
> >         }
> >  }
> >
> > -static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
> > +static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
> > +                                                 uint64_t guest_memfd_flags,
> > +                                                 size_t page_size)
> >  {
> > -       size_t page_size = getpagesize();
> > -       uint64_t flag;
> >         size_t size;
> >         int fd;
> >
> >         for (size = 1; size < page_size; size++) {
> > -               fd = __vm_create_guest_memfd(vm, size, 0);
> > -               TEST_ASSERT(fd == -1 && errno == EINVAL,
> > +               fd = __vm_create_guest_memfd(vm, size, guest_memfd_flags);
> > +               TEST_ASSERT(fd < 0 && errno == EINVAL,
> >                             "guest_memfd() with non-page-aligned page size '0x%lx' should fail with EINVAL",
> >                             size);
> >         }
> > -
> > -       for (flag = BIT(0); flag; flag <<= 1) {
> > -               fd = __vm_create_guest_memfd(vm, page_size, flag);
> > -               TEST_ASSERT(fd == -1 && errno == EINVAL,
> > -                           "guest_memfd() with flag '0x%lx' should fail with EINVAL",
> > -                           flag);
> > -       }
> >  }
> >
> >  static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> > @@ -171,30 +237,123 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> >         close(fd1);
> >  }
> >
> > -int main(int argc, char *argv[])
> > +static bool check_vm_type(unsigned long vm_type)
> >  {
> > -       size_t page_size;
> > +       /*
> > +        * Not all architectures support KVM_CAP_VM_TYPES. However, those that
> > +        * support guest_memfd have that support for the default VM type.
> > +        */
> > +       if (vm_type == VM_TYPE_DEFAULT)
> > +               return true;
> > +
> > +       return kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type);
> > +}
> > +
> > +static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
> > +                          bool expect_mmap_allowed)
> > +{
> > +       struct kvm_vm *vm;
> >         size_t total_size;
> > +       size_t page_size;
> >         int fd;
> > -       struct kvm_vm *vm;
> >
> > -       TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> > +       if (!check_vm_type(vm_type))
> > +               return;
> >
> >         page_size = getpagesize();
> >         total_size = page_size * 4;
> >
> > -       vm = vm_create_barebones();
> > +       vm = vm_create_barebones_type(vm_type);
> >
> > -       test_create_guest_memfd_invalid(vm);
> >         test_create_guest_memfd_multiple(vm);
> > +       test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size);
> >
> > -       fd = vm_create_guest_memfd(vm, total_size, 0);
> > +       fd = vm_create_guest_memfd(vm, total_size, guest_memfd_flags);
> >
> >         test_file_read_write(fd);
> > -       test_mmap(fd, page_size);
> > +
> > +       if (expect_mmap_allowed) {
> > +               test_mmap_supported(fd, page_size, total_size);
> > +               test_fault_overflow(fd, page_size, total_size);
> > +
> > +       } else {
> > +               test_mmap_not_supported(fd, page_size, total_size);
> > +       }
> > +
> >         test_file_size(fd, page_size, total_size);
> >         test_fallocate(fd, page_size, total_size);
> >         test_invalid_punch_hole(fd, page_size, total_size);
> >
> >         close(fd);
> > +       kvm_vm_release(vm);
>
> I think kvm_vm_free() is probably more appropriate?

Ack (for both).

Cheers,
/fuad

> > +}
> > +
> > +static void test_vm_type_gmem_flag_validity(unsigned long vm_type,
> > +                                           uint64_t expected_valid_flags)
> > +{
> > +       size_t page_size = getpagesize();
> > +       struct kvm_vm *vm;
> > +       uint64_t flag = 0;
> > +       int fd;
> > +
> > +       if (!check_vm_type(vm_type))
> > +               return;
> > +
> > +       vm = vm_create_barebones_type(vm_type);
> > +
> > +       for (flag = BIT(0); flag; flag <<= 1) {
> > +               fd = __vm_create_guest_memfd(vm, page_size, flag);
> > +
> > +               if (flag & expected_valid_flags) {
> > +                       TEST_ASSERT(fd >= 0,
> > +                                   "guest_memfd() with flag '0x%lx' should be valid",
> > +                                   flag);
> > +                       close(fd);
> > +               } else {
> > +                       TEST_ASSERT(fd < 0 && errno == EINVAL,
> > +                                   "guest_memfd() with flag '0x%lx' should fail with EINVAL",
> > +                                   flag);
> > +               }
> > +       }
> > +
> > +       kvm_vm_release(vm);
>
> Same here.
>
> > +}
> > +
> > +static void test_gmem_flag_validity(void)
> > +{
> > +       uint64_t non_coco_vm_valid_flags = 0;
> > +
> > +       if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
> > +               non_coco_vm_valid_flags = GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +
> > +       test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, non_coco_vm_valid_flags);
> > +
> > +#ifdef __x86_64__
> > +       test_vm_type_gmem_flag_validity(KVM_X86_SW_PROTECTED_VM, non_coco_vm_valid_flags);
> > +       test_vm_type_gmem_flag_validity(KVM_X86_SEV_VM, 0);
> > +       test_vm_type_gmem_flag_validity(KVM_X86_SEV_ES_VM, 0);
> > +       test_vm_type_gmem_flag_validity(KVM_X86_SNP_VM, 0);
> > +       test_vm_type_gmem_flag_validity(KVM_X86_TDX_VM, 0);
> > +#endif
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +       TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> > +
> > +       test_gmem_flag_validity();
> > +
> > +       test_with_type(VM_TYPE_DEFAULT, 0, false);
> > +       if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
> > +               test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_SUPPORT_SHARED,
> > +                              true);
> > +       }
> > +
> > +#ifdef __x86_64__
> > +       test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
> > +       if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
> > +               test_with_type(KVM_X86_SW_PROTECTED_VM,
> > +                              GUEST_MEMFD_FLAG_SUPPORT_SHARED, true);
> > +       }
> > +#endif
> >  }
> > --
> > 2.49.0.1266.g31b7d2e469-goog
> >


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test
  2025-06-05 15:37 ` [PATCH v11 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
@ 2025-06-06  8:15   ` David Hildenbrand
  2025-06-08 23:43   ` Gavin Shan
  1 sibling, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2025-06-06  8:15 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 05.06.25 17:37, Fuad Tabba wrote:
> Using hardcoded page size values could cause the test to fail on systems
> that have larger pages, e.g., arm64 with 64kB pages. Use getpagesize()
> instead.
> 
> Also, build the guest_memfd selftest for arm64.
> 
> Suggested-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
  2025-06-05 15:37 ` [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
@ 2025-06-06  9:12   ` David Hildenbrand
  2025-06-06  9:30     ` Fuad Tabba
  2025-06-11  6:29     ` Shivank Garg
  2025-06-08 23:42   ` Gavin Shan
  1 sibling, 2 replies; 56+ messages in thread
From: David Hildenbrand @ 2025-06-06  9:12 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 05.06.25 17:37, Fuad Tabba wrote:
> This patch enables support for shared memory in guest_memfd, including
> mapping that memory from host userspace.
> 
> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> flag at creation time.
> 
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---

[...]

> +static bool kvm_gmem_supports_shared(struct inode *inode)
> +{
> +	u64 flags;
> +
> +	if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
> +		return false;
> +
> +	flags = (u64)inode->i_private;

Can probably do above

const u64 flags = (u64)inode->i_private;

> +
> +	return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +}
> +
> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> +{
> +	struct inode *inode = file_inode(vmf->vma->vm_file);
> +	struct folio *folio;
> +	vm_fault_t ret = VM_FAULT_LOCKED;
> +
> +	if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> +		return VM_FAULT_SIGBUS;
> +
> +	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> +	if (IS_ERR(folio)) {
> +		int err = PTR_ERR(folio);
> +
> +		if (err == -EAGAIN)
> +			return VM_FAULT_RETRY;
> +
> +		return vmf_error(err);
> +	}
> +
> +	if (WARN_ON_ONCE(folio_test_large(folio))) {
> +		ret = VM_FAULT_SIGBUS;
> +		goto out_folio;
> +	}
> +
> +	if (!folio_test_uptodate(folio)) {
> +		clear_highpage(folio_page(folio, 0));
> +		kvm_gmem_mark_prepared(folio);
> +	}
> +
> +	vmf->page = folio_file_page(folio, vmf->pgoff);
> +
> +out_folio:
> +	if (ret != VM_FAULT_LOCKED) {
> +		folio_unlock(folio);
> +		folio_put(folio);
> +	}
> +
> +	return ret;
> +}
> +
> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> +	.fault = kvm_gmem_fault_shared,
> +};
> +
> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	if (!kvm_gmem_supports_shared(file_inode(file)))
> +		return -ENODEV;
> +
> +	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> +	    (VM_SHARED | VM_MAYSHARE)) {
> +		return -EINVAL;
> +	}
> +
> +	vma->vm_ops = &kvm_gmem_vm_ops;
> +
> +	return 0;
> +}
> +
>   static struct file_operations kvm_gmem_fops = {
> +	.mmap		= kvm_gmem_mmap,
>   	.open		= generic_file_open,
>   	.release	= kvm_gmem_release,
>   	.fallocate	= kvm_gmem_fallocate,
> @@ -428,6 +500,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
>   	}
>   
>   	file->f_flags |= O_LARGEFILE;
> +	allow_write_access(file);

Why is that required?

As the docs mention, it must be paired with a previous deny_write_access().

... and I don't find similar usage anywhere else.

Apart from that here

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/18] KVM: guest_memfd: Track shared memory support in memslot
  2025-06-05 15:37 ` [PATCH v11 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
@ 2025-06-06  9:13   ` David Hildenbrand
  2025-06-08 23:42   ` Gavin Shan
  1 sibling, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2025-06-06  9:13 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 05.06.25 17:37, Fuad Tabba wrote:
> Add a new internal flag in the top half of memslot->flags to track when
> a guest_memfd-backed slot supports shared memory, which is reserved for
> internal use in KVM.
> 
> This avoids repeatedly checking the underlying guest_memfd file for
> shared memory support, which requires taking a reference on the file.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level
  2025-06-05 15:37 ` [PATCH v11 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
@ 2025-06-06  9:14   ` David Hildenbrand
  2025-06-06  9:48     ` Fuad Tabba
  0 siblings, 1 reply; 56+ messages in thread
From: David Hildenbrand @ 2025-06-06  9:14 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 05.06.25 17:37, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> This patch adds kvm_gmem_max_mapping_level(), which always returns
> PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
> 
> When guest_memfd supports shared memory, max_mapping_level (especially
> when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
> from recover_huge_pages_range()) should take input from
> guest_memfd.
> 
> Input from guest_memfd should be taken in these cases:
> 
> + if the memslot supports shared memory (guest_memfd is used for
>    shared memory, or in future both shared and private memory) or
> + if the memslot is only used for private memory and that gfn is
>    private.
> 
> If the memslot doesn't use guest_memfd, figure out the
> max_mapping_level using the host page tables like before.
> 
> This patch also refactors and inlines the other call to
> __kvm_mmu_max_mapping_level().
> 
> In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
> provided (if applicable) in fault->max_level. Hence, there is no need
> to query guest_memfd.
> 
> lpage_info is queried like before, and then if the fault is not from
> guest_memfd, adjust fault->req_level based on input from host page
> tables.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---

[...]

>   static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>   {
>   	return false;
> @@ -2561,6 +2565,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>   int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>   		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
>   		     int *max_order);
> +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
>   #else
>   static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>   				   struct kvm_memory_slot *slot, gfn_t gfn,
> @@ -2570,6 +2575,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>   	KVM_BUG_ON(1, kvm);
>   	return -EIO;
>   }
> +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
> +					 gfn_t gfn)
> +{
> +	BUG();
> +	return 0;

As raised, no BUG(). If this is unreachable for these configs,

BUILD_BUG() might do.

Apart from that

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
  2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (17 preceding siblings ...)
  2025-06-05 15:38 ` [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
@ 2025-06-06  9:18 ` David Hildenbrand
  18 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2025-06-06  9:18 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny

On 05.06.25 17:37, Fuad Tabba wrote:
> Main changes since v10 [1]:
> - Added bounds checking when faulting a shared page into the host, along
>    with a selftest to verify the check.
> - Refactored KVM/arm64's handling of guest faults (user_mem_abort()).
>    I've dropped the Reviewed-by tags from "KVM: arm64: Refactor
>    user_mem_abort()..." since it has changed significantly.
> - Handled nested virtualization in KVM/arm64 when faulting guest_memfd
>    backed pages into the guest.
> - Addressed various points of feedback from the last revision.
> - Still based on Linux 6.15
> 
> This patch series enables the mapping of guest_memfd backed memory in
> the host. This is useful for VMMs like Firecracker that aim to run
> guests entirely backed by guest_memfd [2]. When combined with Patrick's
> series for direct map removal [3], this provides additional hardening
> against Spectre-like transient execution attacks.
> 
> This series also lays the groundwork for restricted mmap() support for
> guest_memfd backed memory in the host for Confidential Computing
> platforms that permit in-place sharing of guest memory with the host
> [4].
> 
> Patch breakdown:
> 
> Patches 1-7: Primarily refactoring and renaming to decouple the concept
> of guest memory being "private" from it being backed by guest_memfd.
> 
> Patches 8-9: Add support for in-place shared memory and the ability for
> the host to map it. This is gated by a new configuration option, toggled
> by a new flag, and advertised to userspace by a new capability
> (introduced in patch 16).
> 
> Patches 10-15: Implement the x86 and arm64 support for this feature.
> 
> Patch 16: Introduces the new capability to advertise this support and
> updates the documentation.
> 
> Patches 17-18: Add and fix selftests for the new functionality.
> 
> For details on how to test this patch series, and on how to boot a guest
> that uses the new features, please refer to v8 [5].

Paolo Et. al,

I only found some smaller things, this is looking mostly good to me.

... worth having a look ;)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
  2025-06-06  9:12   ` David Hildenbrand
@ 2025-06-06  9:30     ` Fuad Tabba
  2025-06-06  9:55       ` David Hildenbrand
  2025-06-11  6:29     ` Shivank Garg
  1 sibling, 1 reply; 56+ messages in thread
From: Fuad Tabba @ 2025-06-06  9:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi David,

On Fri, 6 Jun 2025 at 10:12, David Hildenbrand <david@redhat.com> wrote:
>
> On 05.06.25 17:37, Fuad Tabba wrote:
> > This patch enables support for shared memory in guest_memfd, including
> > mapping that memory from host userspace.
> >
> > This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> > and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> > flag at creation time.
> >
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
>
> [...]
>
> > +static bool kvm_gmem_supports_shared(struct inode *inode)
> > +{
> > +     u64 flags;
> > +
> > +     if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
> > +             return false;
> > +
> > +     flags = (u64)inode->i_private;
>
> Can probably do above
>
> const u64 flags = (u64)inode->i_private;
>

Ack.

> > +
> > +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +}
> > +
> > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> > +{
> > +     struct inode *inode = file_inode(vmf->vma->vm_file);
> > +     struct folio *folio;
> > +     vm_fault_t ret = VM_FAULT_LOCKED;
> > +
> > +     if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> > +             return VM_FAULT_SIGBUS;
> > +
> > +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> > +     if (IS_ERR(folio)) {
> > +             int err = PTR_ERR(folio);
> > +
> > +             if (err == -EAGAIN)
> > +                     return VM_FAULT_RETRY;
> > +
> > +             return vmf_error(err);
> > +     }
> > +
> > +     if (WARN_ON_ONCE(folio_test_large(folio))) {
> > +             ret = VM_FAULT_SIGBUS;
> > +             goto out_folio;
> > +     }
> > +
> > +     if (!folio_test_uptodate(folio)) {
> > +             clear_highpage(folio_page(folio, 0));
> > +             kvm_gmem_mark_prepared(folio);
> > +     }
> > +
> > +     vmf->page = folio_file_page(folio, vmf->pgoff);
> > +
> > +out_folio:
> > +     if (ret != VM_FAULT_LOCKED) {
> > +             folio_unlock(folio);
> > +             folio_put(folio);
> > +     }
> > +
> > +     return ret;
> > +}
> > +
> > +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> > +     .fault = kvm_gmem_fault_shared,
> > +};
> > +
> > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +     if (!kvm_gmem_supports_shared(file_inode(file)))
> > +             return -ENODEV;
> > +
> > +     if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> > +         (VM_SHARED | VM_MAYSHARE)) {
> > +             return -EINVAL;
> > +     }
> > +
> > +     vma->vm_ops = &kvm_gmem_vm_ops;
> > +
> > +     return 0;
> > +}
> > +
> >   static struct file_operations kvm_gmem_fops = {
> > +     .mmap           = kvm_gmem_mmap,
> >       .open           = generic_file_open,
> >       .release        = kvm_gmem_release,
> >       .fallocate      = kvm_gmem_fallocate,
> > @@ -428,6 +500,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
> >       }
> >
> >       file->f_flags |= O_LARGEFILE;
> > +     allow_write_access(file);
>
> Why is that required?
>
> As the docs mention, it must be paired with a previous deny_write_access().
>
> ... and I don't find similar usage anywhere else.

This is to address Gavin's concern [*] regarding MADV_COLLAPSE, which
isn't an issue until hugepage support is enabled. Should we wait until
we have hugepage support?

[*] https://lore.kernel.org/all/a3d6ff25-236b-4dfd-8a04-6df437ecb4bb@redhat.com/


> Apart from that here
>
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks!
/fuad

> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level
  2025-06-06  9:14   ` David Hildenbrand
@ 2025-06-06  9:48     ` Fuad Tabba
  0 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-06  9:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Fri, 6 Jun 2025 at 10:14, David Hildenbrand <david@redhat.com> wrote:
>
> On 05.06.25 17:37, Fuad Tabba wrote:
> > From: Ackerley Tng <ackerleytng@google.com>
> >
> > This patch adds kvm_gmem_max_mapping_level(), which always returns
> > PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
> >
> > When guest_memfd supports shared memory, max_mapping_level (especially
> > when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
> > from recover_huge_pages_range()) should take input from
> > guest_memfd.
> >
> > Input from guest_memfd should be taken in these cases:
> >
> > + if the memslot supports shared memory (guest_memfd is used for
> >    shared memory, or in future both shared and private memory) or
> > + if the memslot is only used for private memory and that gfn is
> >    private.
> >
> > If the memslot doesn't use guest_memfd, figure out the
> > max_mapping_level using the host page tables like before.
> >
> > This patch also refactors and inlines the other call to
> > __kvm_mmu_max_mapping_level().
> >
> > In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
> > provided (if applicable) in fault->max_level. Hence, there is no need
> > to query guest_memfd.
> >
> > lpage_info is queried like before, and then if the fault is not from
> > guest_memfd, adjust fault->req_level based on input from host page
> > tables.
> >
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Co-developed-by: Fuad Tabba <tabba@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
>
> [...]
>
> >   static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >   {
> >       return false;
> > @@ -2561,6 +2565,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >   int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >                    gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
> >                    int *max_order);
> > +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
> >   #else
> >   static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >                                  struct kvm_memory_slot *slot, gfn_t gfn,
> > @@ -2570,6 +2575,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >       KVM_BUG_ON(1, kvm);
> >       return -EIO;
> >   }
> > +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
> > +                                      gfn_t gfn)
> > +{
> > +     BUG();
> > +     return 0;
>
> As raised, no BUG(). If this is unreachable for these configs,
>
> BUILD_BUG() might do.
>

Ack.

> Apart from that
>
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks!
/fuad

> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
  2025-06-06  9:30     ` Fuad Tabba
@ 2025-06-06  9:55       ` David Hildenbrand
  2025-06-06 10:33         ` Fuad Tabba
  0 siblings, 1 reply; 56+ messages in thread
From: David Hildenbrand @ 2025-06-06  9:55 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 06.06.25 11:30, Fuad Tabba wrote:
> Hi David,
> 
> On Fri, 6 Jun 2025 at 10:12, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 05.06.25 17:37, Fuad Tabba wrote:
>>> This patch enables support for shared memory in guest_memfd, including
>>> mapping that memory from host userspace.
>>>
>>> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
>>> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
>>> flag at creation time.
>>>
>>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>
>> [...]
>>
>>> +static bool kvm_gmem_supports_shared(struct inode *inode)
>>> +{
>>> +     u64 flags;
>>> +
>>> +     if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
>>> +             return false;
>>> +
>>> +     flags = (u64)inode->i_private;
>>
>> Can probably do above
>>
>> const u64 flags = (u64)inode->i_private;
>>
> 
> Ack.
> 
>>> +
>>> +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
>>> +}
>>> +
>>> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
>>> +{
>>> +     struct inode *inode = file_inode(vmf->vma->vm_file);
>>> +     struct folio *folio;
>>> +     vm_fault_t ret = VM_FAULT_LOCKED;
>>> +
>>> +     if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>>> +             return VM_FAULT_SIGBUS;
>>> +
>>> +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>>> +     if (IS_ERR(folio)) {
>>> +             int err = PTR_ERR(folio);
>>> +
>>> +             if (err == -EAGAIN)
>>> +                     return VM_FAULT_RETRY;
>>> +
>>> +             return vmf_error(err);
>>> +     }
>>> +
>>> +     if (WARN_ON_ONCE(folio_test_large(folio))) {
>>> +             ret = VM_FAULT_SIGBUS;
>>> +             goto out_folio;
>>> +     }
>>> +
>>> +     if (!folio_test_uptodate(folio)) {
>>> +             clear_highpage(folio_page(folio, 0));
>>> +             kvm_gmem_mark_prepared(folio);
>>> +     }
>>> +
>>> +     vmf->page = folio_file_page(folio, vmf->pgoff);
>>> +
>>> +out_folio:
>>> +     if (ret != VM_FAULT_LOCKED) {
>>> +             folio_unlock(folio);
>>> +             folio_put(folio);
>>> +     }
>>> +
>>> +     return ret;
>>> +}
>>> +
>>> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
>>> +     .fault = kvm_gmem_fault_shared,
>>> +};
>>> +
>>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>>> +{
>>> +     if (!kvm_gmem_supports_shared(file_inode(file)))
>>> +             return -ENODEV;
>>> +
>>> +     if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
>>> +         (VM_SHARED | VM_MAYSHARE)) {
>>> +             return -EINVAL;
>>> +     }
>>> +
>>> +     vma->vm_ops = &kvm_gmem_vm_ops;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>>    static struct file_operations kvm_gmem_fops = {
>>> +     .mmap           = kvm_gmem_mmap,
>>>        .open           = generic_file_open,
>>>        .release        = kvm_gmem_release,
>>>        .fallocate      = kvm_gmem_fallocate,
>>> @@ -428,6 +500,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
>>>        }
>>>
>>>        file->f_flags |= O_LARGEFILE;
>>> +     allow_write_access(file);
>>
>> Why is that required?
>>
>> As the docs mention, it must be paired with a previous deny_write_access().
>>
>> ... and I don't find similar usage anywhere else.
> 
> This is to address Gavin's concern [*] regarding MADV_COLLAPSE, which
> isn't an issue until hugepage support is enabled. Should we wait until
> we have hugepage support?

If we keep this, we *definitely* need a comment why we do something 
nobody else does.

But I don't think allow_write_access() would ever be the way we want to 
fence off MADV_COLLAPSE. :) Maybe AS_INACCESSIBLE or sth. like that 
could fence it off in file_thp_enabled().

Fortunately, CONFIG_READ_ONLY_THP_FOR_FS might vanish at some point ... 
so I've been told.

So if it's not done for secretmem for now or others, we also shouldn't 
be doing it for now I think.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
  2025-06-06  9:55       ` David Hildenbrand
@ 2025-06-06 10:33         ` Fuad Tabba
  0 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-06 10:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Fri, 6 Jun 2025 at 10:55, David Hildenbrand <david@redhat.com> wrote:
>
> On 06.06.25 11:30, Fuad Tabba wrote:
> > Hi David,
> >
> > On Fri, 6 Jun 2025 at 10:12, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 05.06.25 17:37, Fuad Tabba wrote:
> >>> This patch enables support for shared memory in guest_memfd, including
> >>> mapping that memory from host userspace.
> >>>
> >>> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> >>> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> >>> flag at creation time.
> >>>
> >>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> >>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> >>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>> ---
> >>
> >> [...]
> >>
> >>> +static bool kvm_gmem_supports_shared(struct inode *inode)
> >>> +{
> >>> +     u64 flags;
> >>> +
> >>> +     if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
> >>> +             return false;
> >>> +
> >>> +     flags = (u64)inode->i_private;
> >>
> >> Can probably do above
> >>
> >> const u64 flags = (u64)inode->i_private;
> >>
> >
> > Ack.
> >
> >>> +
> >>> +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> >>> +}
> >>> +
> >>> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> >>> +{
> >>> +     struct inode *inode = file_inode(vmf->vma->vm_file);
> >>> +     struct folio *folio;
> >>> +     vm_fault_t ret = VM_FAULT_LOCKED;
> >>> +
> >>> +     if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> >>> +             return VM_FAULT_SIGBUS;
> >>> +
> >>> +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> >>> +     if (IS_ERR(folio)) {
> >>> +             int err = PTR_ERR(folio);
> >>> +
> >>> +             if (err == -EAGAIN)
> >>> +                     return VM_FAULT_RETRY;
> >>> +
> >>> +             return vmf_error(err);
> >>> +     }
> >>> +
> >>> +     if (WARN_ON_ONCE(folio_test_large(folio))) {
> >>> +             ret = VM_FAULT_SIGBUS;
> >>> +             goto out_folio;
> >>> +     }
> >>> +
> >>> +     if (!folio_test_uptodate(folio)) {
> >>> +             clear_highpage(folio_page(folio, 0));
> >>> +             kvm_gmem_mark_prepared(folio);
> >>> +     }
> >>> +
> >>> +     vmf->page = folio_file_page(folio, vmf->pgoff);
> >>> +
> >>> +out_folio:
> >>> +     if (ret != VM_FAULT_LOCKED) {
> >>> +             folio_unlock(folio);
> >>> +             folio_put(folio);
> >>> +     }
> >>> +
> >>> +     return ret;
> >>> +}
> >>> +
> >>> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> >>> +     .fault = kvm_gmem_fault_shared,
> >>> +};
> >>> +
> >>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> >>> +{
> >>> +     if (!kvm_gmem_supports_shared(file_inode(file)))
> >>> +             return -ENODEV;
> >>> +
> >>> +     if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> >>> +         (VM_SHARED | VM_MAYSHARE)) {
> >>> +             return -EINVAL;
> >>> +     }
> >>> +
> >>> +     vma->vm_ops = &kvm_gmem_vm_ops;
> >>> +
> >>> +     return 0;
> >>> +}
> >>> +
> >>>    static struct file_operations kvm_gmem_fops = {
> >>> +     .mmap           = kvm_gmem_mmap,
> >>>        .open           = generic_file_open,
> >>>        .release        = kvm_gmem_release,
> >>>        .fallocate      = kvm_gmem_fallocate,
> >>> @@ -428,6 +500,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
> >>>        }
> >>>
> >>>        file->f_flags |= O_LARGEFILE;
> >>> +     allow_write_access(file);
> >>
> >> Why is that required?
> >>
> >> As the docs mention, it must be paired with a previous deny_write_access().
> >>
> >> ... and I don't find similar usage anywhere else.
> >
> > This is to address Gavin's concern [*] regarding MADV_COLLAPSE, which
> > isn't an issue until hugepage support is enabled. Should we wait until
> > we have hugepage support?
>
> If we keep this, we *definitely* need a comment why we do something
> nobody else does.
>
> But I don't think allow_write_access() would ever be the way we want to
> fence off MADV_COLLAPSE. :) Maybe AS_INACCESSIBLE or sth. like that
> could fence it off in file_thp_enabled().
>
> Fortunately, CONFIG_READ_ONLY_THP_FOR_FS might vanish at some point ...
> so I've been told.
>
> So if it's not done for secretmem for now or others, we also shouldn't
> be doing it for now I think.

I'll remove it.

Thanks!
/fuad

> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
  2025-06-05 15:37 ` [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
  2025-06-06  9:12   ` David Hildenbrand
@ 2025-06-08 23:42   ` Gavin Shan
  1 sibling, 0 replies; 56+ messages in thread
From: Gavin Shan @ 2025-06-08 23:42 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 6/6/25 1:37 AM, Fuad Tabba wrote:
> This patch enables support for shared memory in guest_memfd, including
> mapping that memory from host userspace.
> 
> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> flag at creation time.
> 
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   include/linux/kvm_host.h | 13 +++++++
>   include/uapi/linux/kvm.h |  1 +
>   virt/kvm/Kconfig         |  4 +++
>   virt/kvm/guest_memfd.c   | 76 ++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 94 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/18] KVM: guest_memfd: Track shared memory support in memslot
  2025-06-05 15:37 ` [PATCH v11 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
  2025-06-06  9:13   ` David Hildenbrand
@ 2025-06-08 23:42   ` Gavin Shan
  1 sibling, 0 replies; 56+ messages in thread
From: Gavin Shan @ 2025-06-08 23:42 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 6/6/25 1:37 AM, Fuad Tabba wrote:
> Add a new internal flag in the top half of memslot->flags to track when
> a guest_memfd-backed slot supports shared memory, which is reserved for
> internal use in KVM.
> 
> This avoids repeatedly checking the underlying guest_memfd file for
> shared memory support, which requires taking a reference on the file.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   include/linux/kvm_host.h | 11 ++++++++++-
>   virt/kvm/guest_memfd.c   |  2 ++
>   2 files changed, 12 insertions(+), 1 deletion(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test
  2025-06-05 15:37 ` [PATCH v11 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
  2025-06-06  8:15   ` David Hildenbrand
@ 2025-06-08 23:43   ` Gavin Shan
  1 sibling, 0 replies; 56+ messages in thread
From: Gavin Shan @ 2025-06-08 23:43 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 6/6/25 1:37 AM, Fuad Tabba wrote:
> Using hardcoded page size values could cause the test to fail on systems
> that have larger pages, e.g., arm64 with 64kB pages. Use getpagesize()
> instead.
> 
> Also, build the guest_memfd selftest for arm64.
> 
> Suggested-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   tools/testing/selftests/kvm/Makefile.kvm       |  1 +
>   tools/testing/selftests/kvm/guest_memfd_test.c | 11 ++++++-----
>   2 files changed, 7 insertions(+), 5 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2025-06-05 15:38 ` [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
  2025-06-05 22:07   ` James Houghton
@ 2025-06-08 23:43   ` Gavin Shan
  2025-06-09  7:06     ` Fuad Tabba
  1 sibling, 1 reply; 56+ messages in thread
From: Gavin Shan @ 2025-06-08 23:43 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 6/6/25 1:38 AM, Fuad Tabba wrote:
> Expand the guest_memfd selftests to include testing mapping guest
> memory for VM types that support it.
> 
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   .../testing/selftests/kvm/guest_memfd_test.c  | 201 ++++++++++++++++--
>   1 file changed, 180 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 341ba616cf55..1612d3adcd0d 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -13,6 +13,8 @@
>   

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 13/18] KVM: arm64: Refactor user_mem_abort()
  2025-06-05 15:37 ` [PATCH v11 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
@ 2025-06-09  0:27   ` Gavin Shan
  2025-06-09  7:01     ` Fuad Tabba
  0 siblings, 1 reply; 56+ messages in thread
From: Gavin Shan @ 2025-06-09  0:27 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

Hi Fuad,

On 6/6/25 1:37 AM, Fuad Tabba wrote:
> To simplify the code and to make the assumptions clearer,
> refactor user_mem_abort() by immediately setting force_pte to
> true if the conditions are met.
> 
> Remove the comment about logging_active being guaranteed to never be
> true for VM_PFNMAP memslots, since it's not actually correct.
> 
> Move code that will be reused in the following patch into separate
> functions.
> 
> Other small instances of tidying up.
> 
> No functional change intended.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/arm64/kvm/mmu.c | 100 ++++++++++++++++++++++++-------------------
>   1 file changed, 55 insertions(+), 45 deletions(-)
> 

One nitpick below in case v12 is needed. In either way, it looks good to me:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index eeda92330ade..ce80be116a30 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1466,13 +1466,56 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
>   	return vma->vm_flags & VM_MTE_ALLOWED;
>   }
>   
> +static int prepare_mmu_memcache(struct kvm_vcpu *vcpu, bool topup_memcache,
> +				void **memcache)
> +{
> +	int min_pages;
> +
> +	if (!is_protected_kvm_enabled())
> +		*memcache = &vcpu->arch.mmu_page_cache;
> +	else
> +		*memcache = &vcpu->arch.pkvm_memcache;
> +
> +	if (!topup_memcache)
> +		return 0;
> +

It's unnecessary to initialize 'memcache' when topup_memcache is false.

	if (!topup_memcache)
		return 0;

	min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
	if (!is_protected_kvm_enabled())
		*memcache = &vcpu->arch.mmu_page_cache;
	else
		*memcache = &vcpu->arch.pkvm_memcache;

Thanks,
Gavin

> +	min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
> +
> +	if (!is_protected_kvm_enabled())
> +		return kvm_mmu_topup_memory_cache(*memcache, min_pages);
> +
> +	return topup_hyp_memcache(*memcache, min_pages);
> +}
> +
> +/*
> + * Potentially reduce shadow S2 permissions to match the guest's own S2. For
> + * exec faults, we'd only reach this point if the guest actually allowed it (see
> + * kvm_s2_handle_perm_fault).
> + *
> + * Also encode the level of the original translation in the SW bits of the leaf
> + * entry as a proxy for the span of that translation. This will be retrieved on
> + * TLB invalidation from the guest and used to limit the invalidation scope if a
> + * TTL hint or a range isn't provided.
> + */
> +static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
> +				      enum kvm_pgtable_prot *prot,
> +				      bool *writable)
> +{
> +	*writable &= kvm_s2_trans_writable(nested);
> +	if (!kvm_s2_trans_readable(nested))
> +		*prot &= ~KVM_PGTABLE_PROT_R;
> +
> +	*prot |= kvm_encode_nested_level(nested);
> +}
> +
>   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			  struct kvm_s2_trans *nested,
>   			  struct kvm_memory_slot *memslot, unsigned long hva,
>   			  bool fault_is_perm)
>   {
>   	int ret = 0;
> -	bool write_fault, writable, force_pte = false;
> +	bool topup_memcache;
> +	bool write_fault, writable;
>   	bool exec_fault, mte_allowed;
>   	bool device = false, vfio_allow_any_uc = false;
>   	unsigned long mmu_seq;
> @@ -1484,6 +1527,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	gfn_t gfn;
>   	kvm_pfn_t pfn;
>   	bool logging_active = memslot_is_logging(memslot);
> +	bool force_pte = logging_active || is_protected_kvm_enabled();
>   	long vma_pagesize, fault_granule;
>   	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>   	struct kvm_pgtable *pgt;
> @@ -1501,28 +1545,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   		return -EFAULT;
>   	}
>   
> -	if (!is_protected_kvm_enabled())
> -		memcache = &vcpu->arch.mmu_page_cache;
> -	else
> -		memcache = &vcpu->arch.pkvm_memcache;
> -
>   	/*
>   	 * Permission faults just need to update the existing leaf entry,
>   	 * and so normally don't require allocations from the memcache. The
>   	 * only exception to this is when dirty logging is enabled at runtime
>   	 * and a write fault needs to collapse a block entry into a table.
>   	 */
> -	if (!fault_is_perm || (logging_active && write_fault)) {
> -		int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
> -
> -		if (!is_protected_kvm_enabled())
> -			ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
> -		else
> -			ret = topup_hyp_memcache(memcache, min_pages);
> -
> -		if (ret)
> -			return ret;
> -	}
> +	topup_memcache = !fault_is_perm || (logging_active && write_fault);
> +	ret = prepare_mmu_memcache(vcpu, topup_memcache, &memcache);
> +	if (ret)
> +		return ret;
>   
>   	/*
>   	 * Let's check if we will get back a huge page backed by hugetlbfs, or
> @@ -1536,16 +1568,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   		return -EFAULT;
>   	}
>   
> -	/*
> -	 * logging_active is guaranteed to never be true for VM_PFNMAP
> -	 * memslots.
> -	 */
> -	if (logging_active || is_protected_kvm_enabled()) {
> -		force_pte = true;
> +	if (force_pte)
>   		vma_shift = PAGE_SHIFT;
> -	} else {
> +	else
>   		vma_shift = get_vma_page_shift(vma, hva);
> -	}
>   
>   	switch (vma_shift) {
>   #ifndef __PAGETABLE_PMD_FOLDED
> @@ -1597,7 +1623,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			max_map_size = PAGE_SIZE;
>   
>   		force_pte = (max_map_size == PAGE_SIZE);
> -		vma_pagesize = min(vma_pagesize, (long)max_map_size);
> +		vma_pagesize = min_t(long, vma_pagesize, max_map_size);
>   	}
>   
>   	/*
> @@ -1626,7 +1652,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
>   	 * with the smp_wmb() in kvm_mmu_invalidate_end().
>   	 */
> -	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> +	mmu_seq = kvm->mmu_invalidate_seq;
>   	mmap_read_unlock(current->mm);
>   
>   	pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
> @@ -1661,24 +1687,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	if (exec_fault && device)
>   		return -ENOEXEC;
>   
> -	/*
> -	 * Potentially reduce shadow S2 permissions to match the guest's own
> -	 * S2. For exec faults, we'd only reach this point if the guest
> -	 * actually allowed it (see kvm_s2_handle_perm_fault).
> -	 *
> -	 * Also encode the level of the original translation in the SW bits
> -	 * of the leaf entry as a proxy for the span of that translation.
> -	 * This will be retrieved on TLB invalidation from the guest and
> -	 * used to limit the invalidation scope if a TTL hint or a range
> -	 * isn't provided.
> -	 */
> -	if (nested) {
> -		writable &= kvm_s2_trans_writable(nested);
> -		if (!kvm_s2_trans_readable(nested))
> -			prot &= ~KVM_PGTABLE_PROT_R;
> -
> -		prot |= kvm_encode_nested_level(nested);
> -	}
> +	if (nested)
> +		adjust_nested_fault_perms(nested, &prot, &writable);
>   
>   	kvm_fault_lock(kvm);
>   	pgt = vcpu->arch.hw_mmu->pgt;



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory
  2025-06-05 15:37 ` [PATCH v11 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
  2025-06-05 17:26   ` James Houghton
@ 2025-06-09  0:29   ` Gavin Shan
  1 sibling, 0 replies; 56+ messages in thread
From: Gavin Shan @ 2025-06-09  0:29 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 6/6/25 1:37 AM, Fuad Tabba wrote:
> Enable the host mapping of guest_memfd-backed memory on arm64.
> 
> This applies to all current arm64 VM types that support guest_memfd.
> Future VM types can restrict this behavior via the
> kvm_arch_gmem_supports_shared_mem() hook if needed.
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/arm64/include/asm/kvm_host.h | 5 +++++
>   arch/arm64/kvm/Kconfig            | 1 +
>   arch/arm64/kvm/mmu.c              | 7 +++++++
>   3 files changed, 13 insertions(+)
> 
Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-06-05 15:37 ` [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
  2025-06-05 17:21   ` James Houghton
@ 2025-06-09  4:08   ` Gavin Shan
  2025-06-09  7:04     ` Fuad Tabba
  1 sibling, 1 reply; 56+ messages in thread
From: Gavin Shan @ 2025-06-09  4:08 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

Hi Fuad,

On 6/6/25 1:37 AM, Fuad Tabba wrote:
> Add arm64 support for handling guest page faults on guest_memfd backed
> memslots. Until guest_memfd supports huge pages, the fault granule is
> restricted to PAGE_SIZE.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/arm64/kvm/mmu.c | 93 ++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 90 insertions(+), 3 deletions(-)
> 

One comment below. Otherwise, it looks good to me.

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ce80be116a30..f14925fe6144 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1508,6 +1508,89 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
>   	*prot |= kvm_encode_nested_level(nested);
>   }
>   
> +#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
> +
> +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +		      struct kvm_s2_trans *nested,
> +		      struct kvm_memory_slot *memslot, bool is_perm)
> +{
> +	bool logging, write_fault, exec_fault, writable;
> +	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
> +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> +	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> +	struct page *page;
> +	struct kvm *kvm = vcpu->kvm;
> +	void *memcache;
> +	kvm_pfn_t pfn;
> +	gfn_t gfn;
> +	int ret;
> +
> +	ret = prepare_mmu_memcache(vcpu, !is_perm, &memcache);
> +	if (ret)
> +		return ret;
> +
> +	if (nested)
> +		gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
> +	else
> +		gfn = fault_ipa >> PAGE_SHIFT;
> +
> +	logging = memslot_is_logging(memslot);
> +	write_fault = kvm_is_write_fault(vcpu);
> +	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
> +
> +	if (write_fault && exec_fault) {
> +		kvm_err("Simultaneous write and execution fault\n");
> +		return -EFAULT;
> +	}
> +
> +	if (is_perm && !write_fault && !exec_fault) {
> +		kvm_err("Unexpected L2 read permission error\n");
> +		return -EFAULT;
> +	}
> +
> +	ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
> +	if (ret) {
> +		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
> +					      write_fault, exec_fault, false);
> +		return ret;
> +	}
> +

-EFAULT or -EHWPOISON shall be returned, as documented in virt/kvm/api.rst. Besides,
kvm_send_hwpoison_signal() should be executed when -EHWPOISON is returned from
kvm_gmem_get_pfn()? :-)

Thanks,
Gavin

> +	writable = !(memslot->flags & KVM_MEM_READONLY) &&
> +		   (!logging || write_fault);
> +
> +	if (nested)
> +		adjust_nested_fault_perms(nested, &prot, &writable);
> +
> +	if (writable)
> +		prot |= KVM_PGTABLE_PROT_W;
> +
> +	if (exec_fault ||
> +	    (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
> +	     (!nested || kvm_s2_trans_executable(nested))))
> +		prot |= KVM_PGTABLE_PROT_X;
> +
> +	kvm_fault_lock(kvm);
> +	if (is_perm) {
> +		/*
> +		 * Drop the SW bits in favour of those stored in the
> +		 * PTE, which will be preserved.
> +		 */
> +		prot &= ~KVM_NV_GUEST_MAP_SZ;
> +		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags);
> +	} else {
> +		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
> +					     __pfn_to_phys(pfn), prot,
> +					     memcache, flags);
> +	}
> +	kvm_release_faultin_page(kvm, page, !!ret, writable);
> +	kvm_fault_unlock(kvm);
> +
> +	if (writable && !ret)
> +		mark_page_dirty_in_slot(kvm, memslot, gfn);
> +
> +	return ret != -EAGAIN ? ret : 0;
> +}
> +
>   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			  struct kvm_s2_trans *nested,
>   			  struct kvm_memory_slot *memslot, unsigned long hva,
> @@ -1532,7 +1615,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>   	struct kvm_pgtable *pgt;
>   	struct page *page;
> -	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
> +	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
>   
>   	if (fault_is_perm)
>   		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
> @@ -1959,8 +2042,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>   		goto out_unlock;
>   	}
>   
> -	ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> -			     esr_fsc_is_permission_fault(esr));
> +	if (kvm_slot_has_gmem(memslot))
> +		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> +				 esr_fsc_is_permission_fault(esr));
> +	else
> +		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> +				     esr_fsc_is_permission_fault(esr));
>   	if (ret == 0)
>   		ret = 1;
>   out:



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 13/18] KVM: arm64: Refactor user_mem_abort()
  2025-06-09  0:27   ` Gavin Shan
@ 2025-06-09  7:01     ` Fuad Tabba
  2025-06-09  9:02       ` Gavin Shan
  0 siblings, 1 reply; 56+ messages in thread
From: Fuad Tabba @ 2025-06-09  7:01 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

Hi Gavin,

On Mon, 9 Jun 2025 at 01:27, Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Fuad,
>
> On 6/6/25 1:37 AM, Fuad Tabba wrote:
> > To simplify the code and to make the assumptions clearer,
> > refactor user_mem_abort() by immediately setting force_pte to
> > true if the conditions are met.
> >
> > Remove the comment about logging_active being guaranteed to never be
> > true for VM_PFNMAP memslots, since it's not actually correct.
> >
> > Move code that will be reused in the following patch into separate
> > functions.
> >
> > Other small instances of tidying up.
> >
> > No functional change intended.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   arch/arm64/kvm/mmu.c | 100 ++++++++++++++++++++++++-------------------
> >   1 file changed, 55 insertions(+), 45 deletions(-)
> >
>
> One nitpick below in case v12 is needed. In either way, it looks good to me:
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
>
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index eeda92330ade..ce80be116a30 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1466,13 +1466,56 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
> >       return vma->vm_flags & VM_MTE_ALLOWED;
> >   }
> >
> > +static int prepare_mmu_memcache(struct kvm_vcpu *vcpu, bool topup_memcache,
> > +                             void **memcache)
> > +{
> > +     int min_pages;
> > +
> > +     if (!is_protected_kvm_enabled())
> > +             *memcache = &vcpu->arch.mmu_page_cache;
> > +     else
> > +             *memcache = &vcpu->arch.pkvm_memcache;
> > +
> > +     if (!topup_memcache)
> > +             return 0;
> > +
>
> It's unnecessary to initialize 'memcache' when topup_memcache is false.

I thought about this before, and I _think_ you're right. However, I
couldn't completely convince myself that that's always the case for
the code to be functionally equivalent (looking at the condition for
kvm_pgtable_stage2_relax_perms() at the end of the function). Which is
why, if I were to do that, I'd do it as a separate patch.

Thanks,
/fuad

>         if (!topup_memcache)
>                 return 0;
>
>         min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
>         if (!is_protected_kvm_enabled())
>                 *memcache = &vcpu->arch.mmu_page_cache;
>         else
>                 *memcache = &vcpu->arch.pkvm_memcache;
>
> Thanks,
> Gavin
>
> > +     min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
> > +
> > +     if (!is_protected_kvm_enabled())
> > +             return kvm_mmu_topup_memory_cache(*memcache, min_pages);
> > +
> > +     return topup_hyp_memcache(*memcache, min_pages);
> > +}
> > +
> > +/*
> > + * Potentially reduce shadow S2 permissions to match the guest's own S2. For
> > + * exec faults, we'd only reach this point if the guest actually allowed it (see
> > + * kvm_s2_handle_perm_fault).
> > + *
> > + * Also encode the level of the original translation in the SW bits of the leaf
> > + * entry as a proxy for the span of that translation. This will be retrieved on
> > + * TLB invalidation from the guest and used to limit the invalidation scope if a
> > + * TTL hint or a range isn't provided.
> > + */
> > +static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
> > +                                   enum kvm_pgtable_prot *prot,
> > +                                   bool *writable)
> > +{
> > +     *writable &= kvm_s2_trans_writable(nested);
> > +     if (!kvm_s2_trans_readable(nested))
> > +             *prot &= ~KVM_PGTABLE_PROT_R;
> > +
> > +     *prot |= kvm_encode_nested_level(nested);
> > +}
> > +
> >   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                         struct kvm_s2_trans *nested,
> >                         struct kvm_memory_slot *memslot, unsigned long hva,
> >                         bool fault_is_perm)
> >   {
> >       int ret = 0;
> > -     bool write_fault, writable, force_pte = false;
> > +     bool topup_memcache;
> > +     bool write_fault, writable;
> >       bool exec_fault, mte_allowed;
> >       bool device = false, vfio_allow_any_uc = false;
> >       unsigned long mmu_seq;
> > @@ -1484,6 +1527,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >       gfn_t gfn;
> >       kvm_pfn_t pfn;
> >       bool logging_active = memslot_is_logging(memslot);
> > +     bool force_pte = logging_active || is_protected_kvm_enabled();
> >       long vma_pagesize, fault_granule;
> >       enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> >       struct kvm_pgtable *pgt;
> > @@ -1501,28 +1545,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >               return -EFAULT;
> >       }
> >
> > -     if (!is_protected_kvm_enabled())
> > -             memcache = &vcpu->arch.mmu_page_cache;
> > -     else
> > -             memcache = &vcpu->arch.pkvm_memcache;
> > -
> >       /*
> >        * Permission faults just need to update the existing leaf entry,
> >        * and so normally don't require allocations from the memcache. The
> >        * only exception to this is when dirty logging is enabled at runtime
> >        * and a write fault needs to collapse a block entry into a table.
> >        */
> > -     if (!fault_is_perm || (logging_active && write_fault)) {
> > -             int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
> > -
> > -             if (!is_protected_kvm_enabled())
> > -                     ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
> > -             else
> > -                     ret = topup_hyp_memcache(memcache, min_pages);
> > -
> > -             if (ret)
> > -                     return ret;
> > -     }
> > +     topup_memcache = !fault_is_perm || (logging_active && write_fault);
> > +     ret = prepare_mmu_memcache(vcpu, topup_memcache, &memcache);
> > +     if (ret)
> > +             return ret;
> >
> >       /*
> >        * Let's check if we will get back a huge page backed by hugetlbfs, or
> > @@ -1536,16 +1568,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >               return -EFAULT;
> >       }
> >
> > -     /*
> > -      * logging_active is guaranteed to never be true for VM_PFNMAP
> > -      * memslots.
> > -      */
> > -     if (logging_active || is_protected_kvm_enabled()) {
> > -             force_pte = true;
> > +     if (force_pte)
> >               vma_shift = PAGE_SHIFT;
> > -     } else {
> > +     else
> >               vma_shift = get_vma_page_shift(vma, hva);
> > -     }
> >
> >       switch (vma_shift) {
> >   #ifndef __PAGETABLE_PMD_FOLDED
> > @@ -1597,7 +1623,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                       max_map_size = PAGE_SIZE;
> >
> >               force_pte = (max_map_size == PAGE_SIZE);
> > -             vma_pagesize = min(vma_pagesize, (long)max_map_size);
> > +             vma_pagesize = min_t(long, vma_pagesize, max_map_size);
> >       }
> >
> >       /*
> > @@ -1626,7 +1652,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >        * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
> >        * with the smp_wmb() in kvm_mmu_invalidate_end().
> >        */
> > -     mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> > +     mmu_seq = kvm->mmu_invalidate_seq;
> >       mmap_read_unlock(current->mm);
> >
> >       pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
> > @@ -1661,24 +1687,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >       if (exec_fault && device)
> >               return -ENOEXEC;
> >
> > -     /*
> > -      * Potentially reduce shadow S2 permissions to match the guest's own
> > -      * S2. For exec faults, we'd only reach this point if the guest
> > -      * actually allowed it (see kvm_s2_handle_perm_fault).
> > -      *
> > -      * Also encode the level of the original translation in the SW bits
> > -      * of the leaf entry as a proxy for the span of that translation.
> > -      * This will be retrieved on TLB invalidation from the guest and
> > -      * used to limit the invalidation scope if a TTL hint or a range
> > -      * isn't provided.
> > -      */
> > -     if (nested) {
> > -             writable &= kvm_s2_trans_writable(nested);
> > -             if (!kvm_s2_trans_readable(nested))
> > -                     prot &= ~KVM_PGTABLE_PROT_R;
> > -
> > -             prot |= kvm_encode_nested_level(nested);
> > -     }
> > +     if (nested)
> > +             adjust_nested_fault_perms(nested, &prot, &writable);
> >
> >       kvm_fault_lock(kvm);
> >       pgt = vcpu->arch.hw_mmu->pgt;
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-06-09  4:08   ` Gavin Shan
@ 2025-06-09  7:04     ` Fuad Tabba
  2025-06-09  9:06       ` Gavin Shan
  0 siblings, 1 reply; 56+ messages in thread
From: Fuad Tabba @ 2025-06-09  7:04 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

Hi Gavin,

On Mon, 9 Jun 2025 at 05:08, Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Fuad,
>
> On 6/6/25 1:37 AM, Fuad Tabba wrote:
> > Add arm64 support for handling guest page faults on guest_memfd backed
> > memslots. Until guest_memfd supports huge pages, the fault granule is
> > restricted to PAGE_SIZE.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   arch/arm64/kvm/mmu.c | 93 ++++++++++++++++++++++++++++++++++++++++++--
> >   1 file changed, 90 insertions(+), 3 deletions(-)
> >
>
> One comment below. Otherwise, it looks good to me.
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
>
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index ce80be116a30..f14925fe6144 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1508,6 +1508,89 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
> >       *prot |= kvm_encode_nested_level(nested);
> >   }
> >
> > +#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
> > +
> > +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > +                   struct kvm_s2_trans *nested,
> > +                   struct kvm_memory_slot *memslot, bool is_perm)
> > +{
> > +     bool logging, write_fault, exec_fault, writable;
> > +     enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
> > +     enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> > +     struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> > +     struct page *page;
> > +     struct kvm *kvm = vcpu->kvm;
> > +     void *memcache;
> > +     kvm_pfn_t pfn;
> > +     gfn_t gfn;
> > +     int ret;
> > +
> > +     ret = prepare_mmu_memcache(vcpu, !is_perm, &memcache);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (nested)
> > +             gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
> > +     else
> > +             gfn = fault_ipa >> PAGE_SHIFT;
> > +
> > +     logging = memslot_is_logging(memslot);
> > +     write_fault = kvm_is_write_fault(vcpu);
> > +     exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
> > +
> > +     if (write_fault && exec_fault) {
> > +             kvm_err("Simultaneous write and execution fault\n");
> > +             return -EFAULT;
> > +     }
> > +
> > +     if (is_perm && !write_fault && !exec_fault) {
> > +             kvm_err("Unexpected L2 read permission error\n");
> > +             return -EFAULT;
> > +     }
> > +
> > +     ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
> > +     if (ret) {
> > +             kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
> > +                                           write_fault, exec_fault, false);
> > +             return ret;
> > +     }
> > +
>
> -EFAULT or -EHWPOISON shall be returned, as documented in virt/kvm/api.rst. Besides,
> kvm_send_hwpoison_signal() should be executed when -EHWPOISON is returned from
> kvm_gmem_get_pfn()? :-)

This is a bit different since we don't have a VMA. Refer to the discussion here:

https://lore.kernel.org/all/20250514212653.1011484-1-jthoughton@google.com/

Thanks!
/fuad

> Thanks,
> Gavin
>
> > +     writable = !(memslot->flags & KVM_MEM_READONLY) &&
> > +                (!logging || write_fault);
> > +
> > +     if (nested)
> > +             adjust_nested_fault_perms(nested, &prot, &writable);
> > +
> > +     if (writable)
> > +             prot |= KVM_PGTABLE_PROT_W;
> > +
> > +     if (exec_fault ||
> > +         (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
> > +          (!nested || kvm_s2_trans_executable(nested))))
> > +             prot |= KVM_PGTABLE_PROT_X;
> > +
> > +     kvm_fault_lock(kvm);
> > +     if (is_perm) {
> > +             /*
> > +              * Drop the SW bits in favour of those stored in the
> > +              * PTE, which will be preserved.
> > +              */
> > +             prot &= ~KVM_NV_GUEST_MAP_SZ;
> > +             ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags);
> > +     } else {
> > +             ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
> > +                                          __pfn_to_phys(pfn), prot,
> > +                                          memcache, flags);
> > +     }
> > +     kvm_release_faultin_page(kvm, page, !!ret, writable);
> > +     kvm_fault_unlock(kvm);
> > +
> > +     if (writable && !ret)
> > +             mark_page_dirty_in_slot(kvm, memslot, gfn);
> > +
> > +     return ret != -EAGAIN ? ret : 0;
> > +}
> > +
> >   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                         struct kvm_s2_trans *nested,
> >                         struct kvm_memory_slot *memslot, unsigned long hva,
> > @@ -1532,7 +1615,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >       enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> >       struct kvm_pgtable *pgt;
> >       struct page *page;
> > -     enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
> > +     enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
> >
> >       if (fault_is_perm)
> >               fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
> > @@ -1959,8 +2042,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >               goto out_unlock;
> >       }
> >
> > -     ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> > -                          esr_fsc_is_permission_fault(esr));
> > +     if (kvm_slot_has_gmem(memslot))
> > +             ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> > +                              esr_fsc_is_permission_fault(esr));
> > +     else
> > +             ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> > +                                  esr_fsc_is_permission_fault(esr));
> >       if (ret == 0)
> >               ret = 1;
> >   out:
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2025-06-08 23:43   ` Gavin Shan
@ 2025-06-09  7:06     ` Fuad Tabba
  0 siblings, 0 replies; 56+ messages in thread
From: Fuad Tabba @ 2025-06-09  7:06 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On Mon, 9 Jun 2025 at 00:44, Gavin Shan <gshan@redhat.com> wrote:
>
> On 6/6/25 1:38 AM, Fuad Tabba wrote:
> > Expand the guest_memfd selftests to include testing mapping guest
> > memory for VM types that support it.
> >
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   .../testing/selftests/kvm/guest_memfd_test.c  | 201 ++++++++++++++++--
> >   1 file changed, 180 insertions(+), 21 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> > index 341ba616cf55..1612d3adcd0d 100644
> > --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> > +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> > @@ -13,6 +13,8 @@
> >
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>


Thanks for the reviews!
/fuad


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 13/18] KVM: arm64: Refactor user_mem_abort()
  2025-06-09  7:01     ` Fuad Tabba
@ 2025-06-09  9:02       ` Gavin Shan
  0 siblings, 0 replies; 56+ messages in thread
From: Gavin Shan @ 2025-06-09  9:02 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

Hi Fuad,

On 6/9/25 5:01 PM, Fuad Tabba wrote:
> On Mon, 9 Jun 2025 at 01:27, Gavin Shan <gshan@redhat.com> wrote:
>>
>> On 6/6/25 1:37 AM, Fuad Tabba wrote:
>>> To simplify the code and to make the assumptions clearer,
>>> refactor user_mem_abort() by immediately setting force_pte to
>>> true if the conditions are met.
>>>
>>> Remove the comment about logging_active being guaranteed to never be
>>> true for VM_PFNMAP memslots, since it's not actually correct.
>>>
>>> Move code that will be reused in the following patch into separate
>>> functions.
>>>
>>> Other small instances of tidying up.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>>    arch/arm64/kvm/mmu.c | 100 ++++++++++++++++++++++++-------------------
>>>    1 file changed, 55 insertions(+), 45 deletions(-)
>>>
>>
>> One nitpick below in case v12 is needed. In either way, it looks good to me:
>>
>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index eeda92330ade..ce80be116a30 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -1466,13 +1466,56 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
>>>        return vma->vm_flags & VM_MTE_ALLOWED;
>>>    }
>>>
>>> +static int prepare_mmu_memcache(struct kvm_vcpu *vcpu, bool topup_memcache,
>>> +                             void **memcache)
>>> +{
>>> +     int min_pages;
>>> +
>>> +     if (!is_protected_kvm_enabled())
>>> +             *memcache = &vcpu->arch.mmu_page_cache;
>>> +     else
>>> +             *memcache = &vcpu->arch.pkvm_memcache;
>>> +
>>> +     if (!topup_memcache)
>>> +             return 0;
>>> +
>>
>> It's unnecessary to initialize 'memcache' when topup_memcache is false.
> 
> I thought about this before, and I _think_ you're right. However, I
> couldn't completely convince myself that that's always the case for
> the code to be functionally equivalent (looking at the condition for
> kvm_pgtable_stage2_relax_perms() at the end of the function). Which is
> why, if I were to do that, I'd do it as a separate patch.
> 

Thanks for the pointer, which I didn't notice. Yeah, it's something out
of scope and can be fixed up in another separate patch after this series
gets merged. Please leave it as of being and sorry for the noise.

To follow up the discussion, I think it's safe to skip initializing 'memcache'
when 'topup_memcache' is false. The current conditions to turn 'memcache' to
true would have guranteed that kvm_pgtable_stage2_map() will be executed.
It means kvm_pgtable_stage2_relax_perms() will be executed when 'topup_memcache'
is false. Besides, it sounds meaningless to dereference 'vcpu->arch.mmu_page_cache'
or 'vcpu->arch.pkvm_page_cache' without toping up it.

There are comments explaining why 'topup_memcache' is set to true for
permission faults.

         /*
          * Permission faults just need to update the existing leaf entry,
          * and so normally don't require allocations from the memcache. The
          * only exception to this is when dirty logging is enabled at runtime
          * and a write fault needs to collapse a block entry into a table.
          */
         topup_memcache = !fault_is_perm || (logging_active && write_fault);

	if (fault_is_perm && vma_pagesize == fault_granule)
		kvm_pgtable_stage2_relax_perms(...);

> Thanks,
> /fuad
> 

Thanks,
Gavin

>>          if (!topup_memcache)
>>                  return 0;
>>
>>          min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
>>          if (!is_protected_kvm_enabled())
>>                  *memcache = &vcpu->arch.mmu_page_cache;
>>          else
>>                  *memcache = &vcpu->arch.pkvm_memcache;
>>
>> Thanks,
>> Gavin
>>
>>> +     min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
>>> +
>>> +     if (!is_protected_kvm_enabled())
>>> +             return kvm_mmu_topup_memory_cache(*memcache, min_pages);
>>> +
>>> +     return topup_hyp_memcache(*memcache, min_pages);
>>> +}
>>> +
>>> +/*
>>> + * Potentially reduce shadow S2 permissions to match the guest's own S2. For
>>> + * exec faults, we'd only reach this point if the guest actually allowed it (see
>>> + * kvm_s2_handle_perm_fault).
>>> + *
>>> + * Also encode the level of the original translation in the SW bits of the leaf
>>> + * entry as a proxy for the span of that translation. This will be retrieved on
>>> + * TLB invalidation from the guest and used to limit the invalidation scope if a
>>> + * TTL hint or a range isn't provided.
>>> + */
>>> +static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
>>> +                                   enum kvm_pgtable_prot *prot,
>>> +                                   bool *writable)
>>> +{
>>> +     *writable &= kvm_s2_trans_writable(nested);
>>> +     if (!kvm_s2_trans_readable(nested))
>>> +             *prot &= ~KVM_PGTABLE_PROT_R;
>>> +
>>> +     *prot |= kvm_encode_nested_level(nested);
>>> +}
>>> +
>>>    static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>                          struct kvm_s2_trans *nested,
>>>                          struct kvm_memory_slot *memslot, unsigned long hva,
>>>                          bool fault_is_perm)
>>>    {
>>>        int ret = 0;
>>> -     bool write_fault, writable, force_pte = false;
>>> +     bool topup_memcache;
>>> +     bool write_fault, writable;
>>>        bool exec_fault, mte_allowed;
>>>        bool device = false, vfio_allow_any_uc = false;
>>>        unsigned long mmu_seq;
>>> @@ -1484,6 +1527,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>        gfn_t gfn;
>>>        kvm_pfn_t pfn;
>>>        bool logging_active = memslot_is_logging(memslot);
>>> +     bool force_pte = logging_active || is_protected_kvm_enabled();
>>>        long vma_pagesize, fault_granule;
>>>        enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>>>        struct kvm_pgtable *pgt;
>>> @@ -1501,28 +1545,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>                return -EFAULT;
>>>        }
>>>
>>> -     if (!is_protected_kvm_enabled())
>>> -             memcache = &vcpu->arch.mmu_page_cache;
>>> -     else
>>> -             memcache = &vcpu->arch.pkvm_memcache;
>>> -
>>>        /*
>>>         * Permission faults just need to update the existing leaf entry,
>>>         * and so normally don't require allocations from the memcache. The
>>>         * only exception to this is when dirty logging is enabled at runtime
>>>         * and a write fault needs to collapse a block entry into a table.
>>>         */
>>> -     if (!fault_is_perm || (logging_active && write_fault)) {
>>> -             int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
>>> -
>>> -             if (!is_protected_kvm_enabled())
>>> -                     ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
>>> -             else
>>> -                     ret = topup_hyp_memcache(memcache, min_pages);
>>> -
>>> -             if (ret)
>>> -                     return ret;
>>> -     }
>>> +     topup_memcache = !fault_is_perm || (logging_active && write_fault);
>>> +     ret = prepare_mmu_memcache(vcpu, topup_memcache, &memcache);
>>> +     if (ret)
>>> +             return ret;
>>>
>>>        /*
>>>         * Let's check if we will get back a huge page backed by hugetlbfs, or
>>> @@ -1536,16 +1568,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>                return -EFAULT;
>>>        }
>>>
>>> -     /*
>>> -      * logging_active is guaranteed to never be true for VM_PFNMAP
>>> -      * memslots.
>>> -      */
>>> -     if (logging_active || is_protected_kvm_enabled()) {
>>> -             force_pte = true;
>>> +     if (force_pte)
>>>                vma_shift = PAGE_SHIFT;
>>> -     } else {
>>> +     else
>>>                vma_shift = get_vma_page_shift(vma, hva);
>>> -     }
>>>
>>>        switch (vma_shift) {
>>>    #ifndef __PAGETABLE_PMD_FOLDED
>>> @@ -1597,7 +1623,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>                        max_map_size = PAGE_SIZE;
>>>
>>>                force_pte = (max_map_size == PAGE_SIZE);
>>> -             vma_pagesize = min(vma_pagesize, (long)max_map_size);
>>> +             vma_pagesize = min_t(long, vma_pagesize, max_map_size);
>>>        }
>>>
>>>        /*
>>> @@ -1626,7 +1652,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>         * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
>>>         * with the smp_wmb() in kvm_mmu_invalidate_end().
>>>         */
>>> -     mmu_seq = vcpu->kvm->mmu_invalidate_seq;
>>> +     mmu_seq = kvm->mmu_invalidate_seq;
>>>        mmap_read_unlock(current->mm);
>>>
>>>        pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
>>> @@ -1661,24 +1687,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>        if (exec_fault && device)
>>>                return -ENOEXEC;
>>>
>>> -     /*
>>> -      * Potentially reduce shadow S2 permissions to match the guest's own
>>> -      * S2. For exec faults, we'd only reach this point if the guest
>>> -      * actually allowed it (see kvm_s2_handle_perm_fault).
>>> -      *
>>> -      * Also encode the level of the original translation in the SW bits
>>> -      * of the leaf entry as a proxy for the span of that translation.
>>> -      * This will be retrieved on TLB invalidation from the guest and
>>> -      * used to limit the invalidation scope if a TTL hint or a range
>>> -      * isn't provided.
>>> -      */
>>> -     if (nested) {
>>> -             writable &= kvm_s2_trans_writable(nested);
>>> -             if (!kvm_s2_trans_readable(nested))
>>> -                     prot &= ~KVM_PGTABLE_PROT_R;
>>> -
>>> -             prot |= kvm_encode_nested_level(nested);
>>> -     }
>>> +     if (nested)
>>> +             adjust_nested_fault_perms(nested, &prot, &writable);
>>>
>>>        kvm_fault_lock(kvm);
>>>        pgt = vcpu->arch.hw_mmu->pgt;
>>
> 



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-06-09  7:04     ` Fuad Tabba
@ 2025-06-09  9:06       ` Gavin Shan
  0 siblings, 0 replies; 56+ messages in thread
From: Gavin Shan @ 2025-06-09  9:06 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

Hi Fuad,

On 6/9/25 5:04 PM, Fuad Tabba wrote: 
> On Mon, 9 Jun 2025 at 05:08, Gavin Shan <gshan@redhat.com> wrote:
>>
>> On 6/6/25 1:37 AM, Fuad Tabba wrote:
>>> Add arm64 support for handling guest page faults on guest_memfd backed
>>> memslots. Until guest_memfd supports huge pages, the fault granule is
>>> restricted to PAGE_SIZE.
>>>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>>    arch/arm64/kvm/mmu.c | 93 ++++++++++++++++++++++++++++++++++++++++++--
>>>    1 file changed, 90 insertions(+), 3 deletions(-)
>>>
>>
>> One comment below. Otherwise, it looks good to me.
>>
>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index ce80be116a30..f14925fe6144 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -1508,6 +1508,89 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
>>>        *prot |= kvm_encode_nested_level(nested);
>>>    }
>>>
>>> +#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
>>> +
>>> +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>> +                   struct kvm_s2_trans *nested,
>>> +                   struct kvm_memory_slot *memslot, bool is_perm)
>>> +{
>>> +     bool logging, write_fault, exec_fault, writable;
>>> +     enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
>>> +     enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>>> +     struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
>>> +     struct page *page;
>>> +     struct kvm *kvm = vcpu->kvm;
>>> +     void *memcache;
>>> +     kvm_pfn_t pfn;
>>> +     gfn_t gfn;
>>> +     int ret;
>>> +
>>> +     ret = prepare_mmu_memcache(vcpu, !is_perm, &memcache);
>>> +     if (ret)
>>> +             return ret;
>>> +
>>> +     if (nested)
>>> +             gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
>>> +     else
>>> +             gfn = fault_ipa >> PAGE_SHIFT;
>>> +
>>> +     logging = memslot_is_logging(memslot);
>>> +     write_fault = kvm_is_write_fault(vcpu);
>>> +     exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
>>> +
>>> +     if (write_fault && exec_fault) {
>>> +             kvm_err("Simultaneous write and execution fault\n");
>>> +             return -EFAULT;
>>> +     }
>>> +
>>> +     if (is_perm && !write_fault && !exec_fault) {
>>> +             kvm_err("Unexpected L2 read permission error\n");
>>> +             return -EFAULT;
>>> +     }
>>> +
>>> +     ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
>>> +     if (ret) {
>>> +             kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
>>> +                                           write_fault, exec_fault, false);
>>> +             return ret;
>>> +     }
>>> +
>>
>> -EFAULT or -EHWPOISON shall be returned, as documented in virt/kvm/api.rst. Besides,
>> kvm_send_hwpoison_signal() should be executed when -EHWPOISON is returned from
>> kvm_gmem_get_pfn()? :-)
> 
> This is a bit different since we don't have a VMA. Refer to the discussion here:
> 
> https://lore.kernel.org/all/20250514212653.1011484-1-jthoughton@google.com/
> 

Thanks for the pointer. You're right that we don't have VMA here. To return the
'ret' to userspace seems the practical way to have here.

Thanks,
Gavin

>>
>>> +     writable = !(memslot->flags & KVM_MEM_READONLY) &&
>>> +                (!logging || write_fault);
>>> +
>>> +     if (nested)
>>> +             adjust_nested_fault_perms(nested, &prot, &writable);
>>> +
>>> +     if (writable)
>>> +             prot |= KVM_PGTABLE_PROT_W;
>>> +
>>> +     if (exec_fault ||
>>> +         (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
>>> +          (!nested || kvm_s2_trans_executable(nested))))
>>> +             prot |= KVM_PGTABLE_PROT_X;
>>> +
>>> +     kvm_fault_lock(kvm);
>>> +     if (is_perm) {
>>> +             /*
>>> +              * Drop the SW bits in favour of those stored in the
>>> +              * PTE, which will be preserved.
>>> +              */
>>> +             prot &= ~KVM_NV_GUEST_MAP_SZ;
>>> +             ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags);
>>> +     } else {
>>> +             ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
>>> +                                          __pfn_to_phys(pfn), prot,
>>> +                                          memcache, flags);
>>> +     }
>>> +     kvm_release_faultin_page(kvm, page, !!ret, writable);
>>> +     kvm_fault_unlock(kvm);
>>> +
>>> +     if (writable && !ret)
>>> +             mark_page_dirty_in_slot(kvm, memslot, gfn);
>>> +
>>> +     return ret != -EAGAIN ? ret : 0;
>>> +}
>>> +
>>>    static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>                          struct kvm_s2_trans *nested,
>>>                          struct kvm_memory_slot *memslot, unsigned long hva,
>>> @@ -1532,7 +1615,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>        enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>>>        struct kvm_pgtable *pgt;
>>>        struct page *page;
>>> -     enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
>>> +     enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
>>>
>>>        if (fault_is_perm)
>>>                fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
>>> @@ -1959,8 +2042,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>>>                goto out_unlock;
>>>        }
>>>
>>> -     ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
>>> -                          esr_fsc_is_permission_fault(esr));
>>> +     if (kvm_slot_has_gmem(memslot))
>>> +             ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
>>> +                              esr_fsc_is_permission_fault(esr));
>>> +     else
>>> +             ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
>>> +                                  esr_fsc_is_permission_fault(esr));
>>>        if (ret == 0)
>>>                ret = 1;
>>>    out:
>>
> 



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
  2025-06-06  9:12   ` David Hildenbrand
  2025-06-06  9:30     ` Fuad Tabba
@ 2025-06-11  6:29     ` Shivank Garg
  2025-06-11 18:20       ` Ackerley Tng
  1 sibling, 1 reply; 56+ messages in thread
From: Shivank Garg @ 2025-06-11  6:29 UTC (permalink / raw)
  To: David Hildenbrand, Fuad Tabba, kvm, linux-arm-msm, linux-mm,
	kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, ira.weiny



On 6/6/2025 2:42 PM, David Hildenbrand wrote:
> On 05.06.25 17:37, Fuad Tabba wrote:
>> This patch enables support for shared memory in guest_memfd, including
>> mapping that memory from host userspace.
>>
>> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
>> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
>> flag at creation time.
>>
>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>> Signed-off-by: Fuad Tabba <tabba@google.com>
>> ---
> 
> [...]
> 
>> +static bool kvm_gmem_supports_shared(struct inode *inode)
>> +{
>> +    u64 flags;
>> +
>> +    if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
>> +        return false;
>> +
>> +    flags = (u64)inode->i_private;
> 
> Can probably do above
> 
> const u64 flags = (u64)inode->i_private;
> 
>> +
>> +    return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
>> +}
>> +

I agree on using const will have some safety, clarity and optimization.
I did not understand why don't we directly check the flags like...

return (u64)inode->i_private & GUEST_MEMFD_FLAG_SUPPORT_SHARED;

...which is more concise.

Thanks,
Shivank




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
  2025-06-11  6:29     ` Shivank Garg
@ 2025-06-11 18:20       ` Ackerley Tng
  0 siblings, 0 replies; 56+ messages in thread
From: Ackerley Tng @ 2025-06-11 18:20 UTC (permalink / raw)
  To: Shivank Garg, David Hildenbrand, Fuad Tabba, kvm, linux-arm-msm,
	linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, mail, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

Shivank Garg <shivankg@amd.com> writes:

> On 6/6/2025 2:42 PM, David Hildenbrand wrote:
>> On 05.06.25 17:37, Fuad Tabba wrote:
>>> This patch enables support for shared memory in guest_memfd, including
>>> mapping that memory from host userspace.
>>>
>>> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
>>> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
>>> flag at creation time.
>>>
>>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>> 
>> [...]
>> 
>>> +static bool kvm_gmem_supports_shared(struct inode *inode)
>>> +{
>>> +    u64 flags;
>>> +
>>> +    if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
>>> +        return false;
>>> +
>>> +    flags = (u64)inode->i_private;
>> 
>> Can probably do above
>> 
>> const u64 flags = (u64)inode->i_private;
>> 
>>> +
>>> +    return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
>>> +}
>>> +
>
> I agree on using const will have some safety, clarity and optimization.
> I did not understand why don't we directly check the flags like...
>
> return (u64)inode->i_private & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
>
> ...which is more concise.

Imo having an explicit variable name here along with the cast is useful
in reinforcing and repeating that guest_memfd is using inode->i_private
to store flags.

I would rather retain the explicit variable name, and looks like in v11
it was retained.

>
> Thanks,
> Shivank


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2025-06-11 18:20 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05 15:37 [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-06-06  9:12   ` David Hildenbrand
2025-06-06  9:30     ` Fuad Tabba
2025-06-06  9:55       ` David Hildenbrand
2025-06-06 10:33         ` Fuad Tabba
2025-06-11  6:29     ` Shivank Garg
2025-06-11 18:20       ` Ackerley Tng
2025-06-08 23:42   ` Gavin Shan
2025-06-05 15:37 ` [PATCH v11 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
2025-06-06  9:13   ` David Hildenbrand
2025-06-08 23:42   ` Gavin Shan
2025-06-05 15:37 ` [PATCH v11 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
2025-06-06  9:14   ` David Hildenbrand
2025-06-06  9:48     ` Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 12/18] KVM: x86: Enable guest_memfd shared memory for SW-protected VMs Fuad Tabba
2025-06-05 15:49   ` David Hildenbrand
2025-06-05 16:11     ` Fuad Tabba
2025-06-05 17:35       ` David Hildenbrand
2025-06-05 17:43         ` Fuad Tabba
2025-06-05 17:45           ` David Hildenbrand
2025-06-05 18:29             ` Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
2025-06-09  0:27   ` Gavin Shan
2025-06-09  7:01     ` Fuad Tabba
2025-06-09  9:02       ` Gavin Shan
2025-06-05 15:37 ` [PATCH v11 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
2025-06-05 17:21   ` James Houghton
2025-06-06  7:31     ` Fuad Tabba
2025-06-06  7:39       ` David Hildenbrand
2025-06-09  4:08   ` Gavin Shan
2025-06-09  7:04     ` Fuad Tabba
2025-06-09  9:06       ` Gavin Shan
2025-06-05 15:37 ` [PATCH v11 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
2025-06-05 17:26   ` James Houghton
2025-06-09  0:29   ` Gavin Shan
2025-06-05 15:37 ` [PATCH v11 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
2025-06-05 15:37 ` [PATCH v11 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
2025-06-06  8:15   ` David Hildenbrand
2025-06-08 23:43   ` Gavin Shan
2025-06-05 15:38 ` [PATCH v11 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-06-05 22:07   ` James Houghton
2025-06-05 22:12     ` Sean Christopherson
2025-06-05 22:17       ` James Houghton
2025-06-06  8:14     ` Fuad Tabba
2025-06-08 23:43   ` Gavin Shan
2025-06-09  7:06     ` Fuad Tabba
2025-06-06  9:18 ` [PATCH v11 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).