[PATCH v15 00/21] KVM: Enable host userspace mapping for guest

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs
@ 2025-07-17 16:27 Fuad Tabba
  2025-07-17 16:27 ` [PATCH v15 01/21] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
                   ` (20 more replies)
  0 siblings, 21 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Main changes since v14 [1]:
* Removed KVM_SW_PROTECTED_VM dependency on KVM_GENERIC_GMEM_POPULATE
* Fixed some commit messages

Based on Linux 6.16-rc6

This patch series enables host userspace mapping of guest_memfd-backed
memory for non-CoCo VMs. This is required for several evolving KVM use
cases:

* Allows VMMs like Firecracker to run guests entirely backed by
  guest_memfd [2]. This provides a unified memory management model for
  both confidential and non-confidential guests, simplifying VMM design.

* Enhanced Security via direct map removal: When combined with Patrick's
  series for direct map removal [3], this provides additional hardening
  against Spectre-like transient execution attacks by eliminating the
  need for host kernel direct maps of guest memory.

* Lays the groundwork for *restricted* mmap() support for
  guest_memfd-backed memory on CoCo platforms [4] that permit in-place
  sharing of guest memory with the host.

Patch breakdown:

* Patches 1-7: Primarily infrastructure refactorings and renames to
  decouple guest_memfd from the concept of "private" memory.

* Patches 8-9: Add support for the host to map guest_memfd backed memory
  for non-CoCo VMs, which includes support for mmap() and fault
  handling. This is gated by a new configuration option, toggled by a
  new flag, and advertised to userspace by a new capability (introduced
  in patch 18).

* Patches 10-14: Implement x86 guest_memfd mmap support.

* Patches 15-18: Implement arm64 guest_memfd mmap support.

* Patch 19: Introduce the new capability to advertise this support and
  update the documentation.

* Patches 20-21: Update and expand selftests for guest_memfd to include
  mmap functionality and improve portability.

To test this patch series and boot a guest utilizing the new features,
please refer to the instructions in v8 of the series [5]. Note that
kvmtool for Linux 6.16 (available at [6]) is required, as the
KVM_CAP_GMEM_MMAP capability number has changed, additionally, drop the
--sw_protected kvmtool parameter to test with the default VM type.

Cheers,
/fuad

[1] https://lore.kernel.org/all/20250715093350.2584932-1-tabba@google.com/
[2] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[3] https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
[4] https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
[5] https://lore.kernel.org/all/20250430165655.605595-1-tabba@google.com/
[6] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/guestmem-basic-6.16

Ackerley Tng (4):
  KVM: x86/mmu: Generalize private_max_mapping_level x86 op to
    max_mapping_level
  KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level
  KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level
  KVM: x86/mmu: Handle guest page faults for guest_memfd with shared
    memory

Fuad Tabba (17):
  KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
    CONFIG_KVM_GENERIC_GMEM_POPULATE
  KVM: Introduce kvm_arch_supports_gmem()
  KVM: x86: Introduce kvm->arch.supports_gmem
  KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
  KVM: Fix comments that refer to slots_lock
  KVM: Fix comment that refers to kvm uapi header path
  KVM: guest_memfd: Allow host to map guest_memfd pages
  KVM: guest_memfd: Track guest_memfd mmap support in memslot
  KVM: x86: Enable guest_memfd mmap for default VM type
  KVM: arm64: Refactor user_mem_abort()
  KVM: arm64: Handle guest_memfd-backed guest page faults
  KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd
  KVM: arm64: Enable host mapping of shared guest_memfd memory
  KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP
  KVM: selftests: Do not use hardcoded page sizes in guest_memfd test
  KVM: selftests: guest_memfd mmap() test when mmap is supported

 Documentation/virt/kvm/api.rst                |   9 +
 arch/arm64/include/asm/kvm_host.h             |   4 +
 arch/arm64/kvm/Kconfig                        |   2 +
 arch/arm64/kvm/mmu.c                          | 203 ++++++++++++-----
 arch/arm64/kvm/nested.c                       |  41 +++-
 arch/x86/include/asm/kvm-x86-ops.h            |   2 +-
 arch/x86/include/asm/kvm_host.h               |  18 +-
 arch/x86/kvm/Kconfig                          |   8 +-
 arch/x86/kvm/mmu/mmu.c                        | 114 ++++++----
 arch/x86/kvm/svm/sev.c                        |  12 +-
 arch/x86/kvm/svm/svm.c                        |   3 +-
 arch/x86/kvm/svm/svm.h                        |   4 +-
 arch/x86/kvm/vmx/main.c                       |   6 +-
 arch/x86/kvm/vmx/tdx.c                        |   6 +-
 arch/x86/kvm/vmx/x86_ops.h                    |   2 +-
 arch/x86/kvm/x86.c                            |   5 +-
 include/linux/kvm_host.h                      |  64 +++++-
 include/uapi/linux/kvm.h                      |   2 +
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  | 208 +++++++++++++++---
 virt/kvm/Kconfig                              |  14 +-
 virt/kvm/Makefile.kvm                         |   2 +-
 virt/kvm/guest_memfd.c                        |  96 +++++++-
 virt/kvm/kvm_main.c                           |  14 +-
 virt/kvm/kvm_mm.h                             |   4 +-
 25 files changed, 665 insertions(+), 179 deletions(-)


base-commit: 347e9f5043c89695b01e66b3ed111755afcf1911
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v15 01/21] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-21 15:17   ` Sean Christopherson
  2025-07-17 16:27 ` [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Rename the Kconfig option CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM. The
original name implied that the feature only supported "private" memory.
However, CONFIG_KVM_PRIVATE_MEM enables guest_memfd in general, which is
not exclusively for private memory. Subsequent patches in this series
will add guest_memfd support for non-CoCo VMs, whose memory is not
private.

Renaming the Kconfig option to CONFIG_KVM_GMEM more accurately reflects
its broader scope as the main Kconfig option for all guest_memfd-backed
memory. This provides clearer semantics for the option and avoids
confusion as new features are introduced.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 include/linux/kvm_host.h        | 14 +++++++-------
 virt/kvm/Kconfig                |  8 ++++----
 virt/kvm/Makefile.kvm           |  2 +-
 virt/kvm/kvm_main.c             |  4 ++--
 virt/kvm/kvm_mm.h               |  4 ++--
 6 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f7af967aa16f..acb25f935d84 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2275,7 +2275,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 		       int tdp_max_root_level, int tdp_huge_page_level);
 
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
 #else
 #define kvm_arch_has_private_mem(kvm) false
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3bde4fb5c6aa..755b09dcafce 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -601,7 +601,7 @@ struct kvm_memory_slot {
 	short id;
 	u16 as_id;
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	struct {
 		/*
 		 * Writes protected by kvm->slots_lock.  Acquiring a
@@ -719,10 +719,10 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
 #endif
 
 /*
- * Arch code must define kvm_arch_has_private_mem if support for private memory
- * is enabled.
+ * Arch code must define kvm_arch_has_private_mem if support for guest_memfd is
+ * enabled.
  */
-#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
+#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
 static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
 {
 	return false;
@@ -2527,7 +2527,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
-	return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
+	return IS_ENABLED(CONFIG_KVM_GMEM) &&
 	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
 }
 #else
@@ -2537,7 +2537,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 }
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
 		     int *max_order);
@@ -2550,7 +2550,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
-#endif /* CONFIG_KVM_PRIVATE_MEM */
+#endif /* CONFIG_KVM_GMEM */
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 727b542074e7..49df4e32bff7 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -112,19 +112,19 @@ config KVM_GENERIC_MEMORY_ATTRIBUTES
        depends on KVM_GENERIC_MMU_NOTIFIER
        bool
 
-config KVM_PRIVATE_MEM
+config KVM_GMEM
        select XARRAY_MULTI
        bool
 
 config KVM_GENERIC_PRIVATE_MEM
        select KVM_GENERIC_MEMORY_ATTRIBUTES
-       select KVM_PRIVATE_MEM
+       select KVM_GMEM
        bool
 
 config HAVE_KVM_ARCH_GMEM_PREPARE
        bool
-       depends on KVM_PRIVATE_MEM
+       depends on KVM_GMEM
 
 config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
-       depends on KVM_PRIVATE_MEM
+       depends on KVM_GMEM
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 724c89af78af..8d00918d4c8b 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -12,4 +12,4 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
-kvm-$(CONFIG_KVM_PRIVATE_MEM) += $(KVM)/guest_memfd.o
+kvm-$(CONFIG_KVM_GMEM) += $(KVM)/guest_memfd.o
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 222f0e894a0c..d5f0ec2d321f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4913,7 +4913,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_MEMORY_ATTRIBUTES:
 		return kvm_supported_mem_attributes(kvm);
 #endif
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_has_private_mem(kvm);
 #endif
@@ -5347,7 +5347,7 @@ static long kvm_vm_ioctl(struct file *filp,
 	case KVM_GET_STATS_FD:
 		r = kvm_vm_ioctl_get_stats_fd(kvm);
 		break;
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	case KVM_CREATE_GUEST_MEMFD: {
 		struct kvm_create_guest_memfd guest_memfd;
 
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index acef3f5c582a..ec311c0d6718 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -67,7 +67,7 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
 }
 #endif /* HAVE_KVM_PFNCACHE */
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 void kvm_gmem_init(struct module *module);
 int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args);
 int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -91,6 +91,6 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 {
 	WARN_ON_ONCE(1);
 }
-#endif /* CONFIG_KVM_PRIVATE_MEM */
+#endif /* CONFIG_KVM_GMEM */
 
 #endif /* __KVM_MM_H__ */
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
  2025-07-17 16:27 ` [PATCH v15 01/21] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-21 16:44   ` Sean Christopherson
  2025-07-17 16:27 ` [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem() Fuad Tabba
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The original name was vague regarding its functionality. This Kconfig
option specifically enables and gates the kvm_gmem_populate() function,
which is responsible for populating a GPA range with guest data.

The new name, KVM_GENERIC_GMEM_POPULATE, describes the purpose of the
option: to enable generic guest_memfd population mechanisms. This
improves clarity for developers and ensures the name accurately reflects
the functionality it controls, especially as guest_memfd support expands
beyond purely "private" memory scenarios.

Note that the vm type KVM_X86_SW_PROTECTED_VM does not need the populate
function. Therefore, ensure that the correct configuration is selected
when KVM_SW_PROTECTED_VM is enabled.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/Kconfig     | 7 ++++---
 include/linux/kvm_host.h | 2 +-
 virt/kvm/Kconfig         | 2 +-
 virt/kvm/guest_memfd.c   | 2 +-
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 2eeffcec5382..12e723bb76cc 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -46,7 +46,8 @@ config KVM_X86
 	select HAVE_KVM_PM_NOTIFIER if PM
 	select KVM_GENERIC_HARDWARE_ENABLING
 	select KVM_GENERIC_PRE_FAULT_MEMORY
-	select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
+	select KVM_GMEM if KVM_SW_PROTECTED_VM
+	select KVM_GENERIC_MEMORY_ATTRIBUTES if KVM_SW_PROTECTED_VM
 	select KVM_WERROR if WERROR
 
 config KVM
@@ -95,7 +96,7 @@ config KVM_SW_PROTECTED_VM
 config KVM_INTEL
 	tristate "KVM for Intel (and compatible) processors support"
 	depends on KVM && IA32_FEAT_CTL
-	select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
+	select KVM_GENERIC_GMEM_POPULATE if INTEL_TDX_HOST
 	select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
 	help
 	  Provides support for KVM on processors equipped with Intel's VT
@@ -157,7 +158,7 @@ config KVM_AMD_SEV
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select ARCH_HAS_CC_PLATFORM
-	select KVM_GENERIC_PRIVATE_MEM
+	select KVM_GENERIC_GMEM_POPULATE
 	select HAVE_KVM_ARCH_GMEM_PREPARE
 	select HAVE_KVM_ARCH_GMEM_INVALIDATE
 	help
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 755b09dcafce..359baaae5e9f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2556,7 +2556,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
 #endif
 
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 /**
  * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
  *
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 49df4e32bff7..559c93ad90be 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -116,7 +116,7 @@ config KVM_GMEM
        select XARRAY_MULTI
        bool
 
-config KVM_GENERIC_PRIVATE_MEM
+config KVM_GENERIC_GMEM_POPULATE
        select KVM_GENERIC_MEMORY_ATTRIBUTES
        select KVM_GMEM
        bool
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b2aa6bf24d3a..befea51bbc75 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -638,7 +638,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque)
 {
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem()
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
  2025-07-17 16:27 ` [PATCH v15 01/21] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
  2025-07-17 16:27 ` [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-18  1:42   ` Xiaoyao Li
  2025-07-21 16:44   ` Sean Christopherson
  2025-07-17 16:27 ` [PATCH v15 04/21] KVM: x86: Introduce kvm->arch.supports_gmem Fuad Tabba
                   ` (17 subsequent siblings)
  20 siblings, 2 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Introduce kvm_arch_supports_gmem() to explicitly indicate whether an
architecture supports guest_memfd.

Previously, kvm_arch_has_private_mem() was used to check for guest_memfd
support. However, this conflated guest_memfd with "private" memory,
implying that guest_memfd was exclusively for CoCo VMs or other private
memory use cases.

With the expansion of guest_memfd to support non-private memory, such as
shared host mappings, it is necessary to decouple these concepts. The
new kvm_arch_supports_gmem() function provides a clear way to check for
guest_memfd support.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h |  4 +++-
 include/linux/kvm_host.h        | 11 +++++++++++
 virt/kvm/kvm_main.c             |  4 ++--
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index acb25f935d84..bde811b2d303 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2277,8 +2277,10 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 #ifdef CONFIG_KVM_GMEM
 #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
+#define kvm_arch_supports_gmem(kvm) kvm_arch_has_private_mem(kvm)
 #else
 #define kvm_arch_has_private_mem(kvm) false
+#define kvm_arch_supports_gmem(kvm) false
 #endif
 
 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
@@ -2331,7 +2333,7 @@ enum {
 #define HF_SMM_INSIDE_NMI_MASK	(1 << 2)
 
 # define KVM_MAX_NR_ADDRESS_SPACES	2
-/* SMM is currently unsupported for guests with private memory. */
+/* SMM is currently unsupported for guests with guest_memfd private memory. */
 # define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 2)
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 359baaae5e9f..ab1bde048034 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -729,6 +729,17 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
 }
 #endif
 
+/*
+ * Arch code must define kvm_arch_supports_gmem if support for guest_memfd is
+ * enabled.
+ */
+#if !defined(kvm_arch_supports_gmem) && !IS_ENABLED(CONFIG_KVM_GMEM)
+static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
+{
+	return false;
+}
+#endif
+
 #ifndef kvm_arch_has_readonly_mem
 static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d5f0ec2d321f..162e2a69cc49 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1588,7 +1588,7 @@ static int check_memory_region_flags(struct kvm *kvm,
 {
 	u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
 
-	if (kvm_arch_has_private_mem(kvm))
+	if (kvm_arch_supports_gmem(kvm))
 		valid_flags |= KVM_MEM_GUEST_MEMFD;
 
 	/* Dirty logging private memory is not currently supported. */
@@ -4915,7 +4915,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #endif
 #ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
-		return !kvm || kvm_arch_has_private_mem(kvm);
+		return !kvm || kvm_arch_supports_gmem(kvm);
 #endif
 	default:
 		break;
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 04/21] KVM: x86: Introduce kvm->arch.supports_gmem
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (2 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem() Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-21 16:45   ` Sean Christopherson
  2025-07-17 16:27 ` [PATCH v15 05/21] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Introduce a new boolean member, supports_gmem, to kvm->arch.

Previously, the has_private_mem boolean within kvm->arch was implicitly
used to indicate whether guest_memfd was supported for a KVM instance.
However, with the broader support for guest_memfd, it's not exclusively
for private or confidential memory. Therefore, it's necessary to
distinguish between a VM's general guest_memfd capabilities and its
support for private memory.

This new supports_gmem member will now explicitly indicate guest_memfd
support for a given VM, allowing has_private_mem to represent only
support for private memory.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/svm/svm.c          | 1 +
 arch/x86/kvm/vmx/tdx.c          | 1 +
 arch/x86/kvm/x86.c              | 4 ++--
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bde811b2d303..938b5be03d33 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1348,6 +1348,7 @@ struct kvm_arch {
 	u8 mmu_valid_gen;
 	u8 vm_type;
 	bool has_private_mem;
+	bool supports_gmem;
 	bool has_protected_state;
 	bool pre_fault_allowed;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
@@ -2277,7 +2278,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 #ifdef CONFIG_KVM_GMEM
 #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
-#define kvm_arch_supports_gmem(kvm) kvm_arch_has_private_mem(kvm)
+#define kvm_arch_supports_gmem(kvm)  ((kvm)->arch.supports_gmem)
 #else
 #define kvm_arch_has_private_mem(kvm) false
 #define kvm_arch_supports_gmem(kvm) false
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ab9b947dbf4f..d1c484eaa8ad 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5181,6 +5181,7 @@ static int svm_vm_init(struct kvm *kvm)
 		to_kvm_sev_info(kvm)->need_init = true;
 
 		kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
+		kvm->arch.supports_gmem = (type == KVM_X86_SNP_VM);
 		kvm->arch.pre_fault_allowed = !kvm->arch.has_private_mem;
 	}
 
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f31ccdeb905b..a3db6df245ee 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -632,6 +632,7 @@ int tdx_vm_init(struct kvm *kvm)
 
 	kvm->arch.has_protected_state = true;
 	kvm->arch.has_private_mem = true;
+	kvm->arch.supports_gmem = true;
 	kvm->arch.disabled_quirks |= KVM_X86_QUIRK_IGNORE_GUEST_PAT;
 
 	/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 357b9e3a6cef..adbdc2cc97d4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12780,8 +12780,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		return -EINVAL;
 
 	kvm->arch.vm_type = type;
-	kvm->arch.has_private_mem =
-		(type == KVM_X86_SW_PROTECTED_VM);
+	kvm->arch.has_private_mem = (type == KVM_X86_SW_PROTECTED_VM);
+	kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
 	/* Decided by the vendor code for other VM types.  */
 	kvm->arch.pre_fault_allowed =
 		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 05/21] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (3 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 04/21] KVM: x86: Introduce kvm->arch.supports_gmem Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-17 16:27 ` [PATCH v15 06/21] KVM: Fix comments that refer to slots_lock Fuad Tabba
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() to improve
clarity and accurately reflect its purpose.

The function kvm_slot_can_be_private() was previously used to check if a
given kvm_memory_slot is backed by guest_memfd. However, its name
implied that the memory in such a slot was exclusively "private".

As guest_memfd support expands to include non-private memory (e.g.,
shared host mappings), it's important to remove this association. The
new name, kvm_slot_has_gmem(), states that the slot is backed by
guest_memfd without making assumptions about the memory's privacy
attributes.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 4 ++--
 arch/x86/kvm/svm/sev.c   | 4 ++--
 include/linux/kvm_host.h | 2 +-
 virt/kvm/guest_memfd.c   | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e06e2e89a8f..213904daf1e5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3285,7 +3285,7 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	bool is_private = kvm_slot_can_be_private(slot) &&
+	bool is_private = kvm_slot_has_gmem(slot) &&
 			  kvm_mem_is_private(kvm, gfn);
 
 	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
@@ -4498,7 +4498,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 {
 	int max_order, r;
 
-	if (!kvm_slot_can_be_private(fault->slot)) {
+	if (!kvm_slot_has_gmem(fault->slot)) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 		return -EFAULT;
 	}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b201f77fcd49..687392c5bf5d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2323,7 +2323,7 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	mutex_lock(&kvm->slots_lock);
 
 	memslot = gfn_to_memslot(kvm, params.gfn_start);
-	if (!kvm_slot_can_be_private(memslot)) {
+	if (!kvm_slot_has_gmem(memslot)) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -4678,7 +4678,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 	}
 
 	slot = gfn_to_memslot(kvm, gfn);
-	if (!kvm_slot_can_be_private(slot)) {
+	if (!kvm_slot_has_gmem(slot)) {
 		pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
 				    gpa);
 		return;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ab1bde048034..ed00c2b40e4b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -614,7 +614,7 @@ struct kvm_memory_slot {
 #endif
 };
 
-static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot)
+static inline bool kvm_slot_has_gmem(const struct kvm_memory_slot *slot)
 {
 	return slot && (slot->flags & KVM_MEM_GUEST_MEMFD);
 }
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index befea51bbc75..6db515833f61 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -654,7 +654,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 		return -EINVAL;
 
 	slot = gfn_to_memslot(kvm, start_gfn);
-	if (!kvm_slot_can_be_private(slot))
+	if (!kvm_slot_has_gmem(slot))
 		return -EINVAL;
 
 	file = kvm_gmem_get_file(slot);
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 06/21] KVM: Fix comments that refer to slots_lock
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (4 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 05/21] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-17 16:27 ` [PATCH v15 07/21] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Fix comments so that they refer to slots_lock instead of slots_locks
(remove trailing s).

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 2 +-
 virt/kvm/kvm_main.c      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ed00c2b40e4b..9c654dfb6dce 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -870,7 +870,7 @@ struct kvm {
 	struct notifier_block pm_notifier;
 #endif
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-	/* Protected by slots_locks (for writes) and RCU (for reads) */
+	/* Protected by slots_lock (for writes) and RCU (for reads) */
 	struct xarray mem_attr_array;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 162e2a69cc49..46bddac1dacd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -331,7 +331,7 @@ void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
 	 * All current use cases for flushing the TLBs for a specific memslot
 	 * are related to dirty logging, and many do the TLB flush out of
 	 * mmu_lock. The interaction between the various operations on memslot
-	 * must be serialized by slots_locks to ensure the TLB flush from one
+	 * must be serialized by slots_lock to ensure the TLB flush from one
 	 * operation is observed by any other operation on the same memslot.
 	 */
 	lockdep_assert_held(&kvm->slots_lock);
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 07/21] KVM: Fix comment that refers to kvm uapi header path
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (5 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 06/21] KVM: Fix comments that refer to slots_lock Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-17 16:27 ` [PATCH v15 08/21] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

The comment that points to the path where the user-visible memslot flags
are refers to an outdated path and has a typo.

Update the comment to refer to the correct path.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9c654dfb6dce..1ec71648824c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -52,7 +52,7 @@
 /*
  * The bit 16 ~ bit 31 of kvm_userspace_memory_region::flags are internally
  * used in kvm, other bits are visible for userspace which are defined in
- * include/linux/kvm_h.
+ * include/uapi/linux/kvm.h.
  */
 #define KVM_MEMSLOT_INVALID	(1UL << 16)
 
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 08/21] KVM: guest_memfd: Allow host to map guest_memfd pages
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (6 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 07/21] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-18  2:56   ` Xiaoyao Li
  2025-07-17 16:27 ` [PATCH v15 09/21] KVM: guest_memfd: Track guest_memfd mmap support in memslot Fuad Tabba
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Introduce the core infrastructure to enable host userspace to mmap()
guest_memfd-backed memory. This is needed for several evolving KVM use
cases:

* Non-CoCo VM backing: Allows VMMs like Firecracker to run guests
  entirely backed by guest_memfd, even for non-CoCo VMs [1]. This
  provides a unified memory management model and simplifies guest memory
  handling.

* Direct map removal for enhanced security: This is an important step
  for direct map removal of guest memory [2]. By allowing host userspace
  to fault in guest_memfd pages directly, we can avoid maintaining host
  kernel direct maps of guest memory. This provides additional hardening
  against Spectre-like transient execution attacks by removing a
  potential attack surface within the kernel.

* Future guest_memfd features: This also lays the groundwork for future
  enhancements to guest_memfd, such as supporting huge pages and
  enabling in-place sharing of guest memory with the host for CoCo
  platforms that permit it [3].

Therefore, enable the basic mmap and fault handling logic within
guest_memfd. However, this functionality is not yet exposed to userspace
and remains inactive until two conditions are met in subsequent patches:

* Kconfig Gate (CONFIG_KVM_GMEM_SUPPORTS_MMAP): A new Kconfig option,
  KVM_GMEM_SUPPORTS_MMAP, that gates this mmap functionality at a system
  level. While the code changes in this patch might seem small, the
  Kconfig option is introduced to explicitly signal the intent to enable
  this new capability and to provide a clear compile-time switch for it.
  It also helps ensure that the necessary architecture-specific glue
  (like kvm_arch_supports_gmem_mmap()) is properly defined.

* Per-instance opt-in (GUEST_MEMFD_FLAG_MMAP): On a per-instance basis,
  this functionality is enabled by the guest_memfd flag
  GUEST_MEMFD_FLAG_MMAP, which will be set in the KVM_CREATE_GUEST_MEMFD
  ioctl. This flag is crucial because when host userspace maps
  guest_memfd pages, KVM must *not* manage the these memory regions in
  the same way it does for traditional KVM memory slots. The presence of
  GUEST_MEMFD_FLAG_MMAP on a guest_memfd instance allows mmap() and
  faulting of guest_memfd memory to host userspace. Additionally, it
  informs KVM to always consume guest faults to this memory from
  guest_memfd, regardless of whether it is a shared or a private fault.
  This opt-in mechanism ensures compatibility and prevents conflicts
  with existing KVM memory management. This is a per-guest_memfd flag
  rather than a per-memslot or per-VM capability because the ability to
  mmap directly applies to the specific guest_memfd object, regardless
  of how it might be used within various memory slots or VMs.

[1] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[2] https://lore.kernel.org/linux-mm/cc1bb8e9bc3e1ab637700a4d3defeec95b55060a.camel@amazon.com
[3] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com/T/#u

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 13 +++++++
 include/uapi/linux/kvm.h |  1 +
 virt/kvm/Kconfig         |  4 +++
 virt/kvm/guest_memfd.c   | 73 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 91 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1ec71648824c..9ac21985f3b5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -740,6 +740,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
 }
 #endif
 
+/*
+ * Returns true if this VM supports mmap() in guest_memfd.
+ *
+ * Arch code must define kvm_arch_supports_gmem_mmap if support for guest_memfd
+ * is enabled.
+ */
+#if !defined(kvm_arch_supports_gmem_mmap)
+static inline bool kvm_arch_supports_gmem_mmap(struct kvm *kvm)
+{
+	return false;
+}
+#endif
+
 #ifndef kvm_arch_has_readonly_mem
 static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7a4c35ff03fe..3beafbf306af 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1596,6 +1596,7 @@ struct kvm_memory_attributes {
 #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
 
 #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
+#define GUEST_MEMFD_FLAG_MMAP	(1ULL << 0)
 
 struct kvm_create_guest_memfd {
 	__u64 size;
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 559c93ad90be..fa4acbedb953 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
 config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
        depends on KVM_GMEM
+
+config KVM_GMEM_SUPPORTS_MMAP
+       select KVM_GMEM
+       bool
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 6db515833f61..07a4b165471d 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
 	return gfn - slot->base_gfn + slot->gmem.pgoff;
 }
 
+static bool kvm_gmem_supports_mmap(struct inode *inode)
+{
+	const u64 flags = (u64)inode->i_private;
+
+	if (!IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP))
+		return false;
+
+	return flags & GUEST_MEMFD_FLAG_MMAP;
+}
+
+static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
+{
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct folio *folio;
+	vm_fault_t ret = VM_FAULT_LOCKED;
+
+	if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
+		return VM_FAULT_SIGBUS;
+
+	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+	if (IS_ERR(folio)) {
+		int err = PTR_ERR(folio);
+
+		if (err == -EAGAIN)
+			return VM_FAULT_RETRY;
+
+		return vmf_error(err);
+	}
+
+	if (WARN_ON_ONCE(folio_test_large(folio))) {
+		ret = VM_FAULT_SIGBUS;
+		goto out_folio;
+	}
+
+	if (!folio_test_uptodate(folio)) {
+		clear_highpage(folio_page(folio, 0));
+		kvm_gmem_mark_prepared(folio);
+	}
+
+	vmf->page = folio_file_page(folio, vmf->pgoff);
+
+out_folio:
+	if (ret != VM_FAULT_LOCKED) {
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+
+	return ret;
+}
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+	.fault = kvm_gmem_fault_user_mapping,
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	if (!kvm_gmem_supports_mmap(file_inode(file)))
+		return -ENODEV;
+
+	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
+	    (VM_SHARED | VM_MAYSHARE)) {
+		return -EINVAL;
+	}
+
+	vma->vm_ops = &kvm_gmem_vm_ops;
+
+	return 0;
+}
+
 static struct file_operations kvm_gmem_fops = {
+	.mmap		= kvm_gmem_mmap,
 	.open		= generic_file_open,
 	.release	= kvm_gmem_release,
 	.fallocate	= kvm_gmem_fallocate,
@@ -463,6 +533,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
 	u64 flags = args->flags;
 	u64 valid_flags = 0;
 
+	if (kvm_arch_supports_gmem_mmap(kvm))
+		valid_flags |= GUEST_MEMFD_FLAG_MMAP;
+
 	if (flags & ~valid_flags)
 		return -EINVAL;
 
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 09/21] KVM: guest_memfd: Track guest_memfd mmap support in memslot
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (7 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 08/21] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-18  3:33   ` Xiaoyao Li
  2025-07-17 16:27 ` [PATCH v15 10/21] KVM: x86/mmu: Generalize private_max_mapping_level x86 op to max_mapping_level Fuad Tabba
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Add a new internal flag, KVM_MEMSLOT_GMEM_ONLY, to the top half of
memslot->flags (which makes it strictly for KVM's internal use). This
flag tracks when a guest_memfd-backed memory slot supports host
userspace mmap operations, which implies that all memory, not just
private memory for CoCo VMs, is consumed through guest_memfd: "gmem
only".

This optimization avoids repeatedly checking the underlying guest_memfd
file for mmap support, which would otherwise require taking and
releasing a reference on the file for each check. By caching this
information directly in the memslot, we reduce overhead and simplify the
logic involved in handling guest_memfd-backed pages for host mappings.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 11 ++++++++++-
 virt/kvm/guest_memfd.c   |  2 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9ac21985f3b5..d2218ec57ceb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -54,7 +54,8 @@
  * used in kvm, other bits are visible for userspace which are defined in
  * include/uapi/linux/kvm.h.
  */
-#define KVM_MEMSLOT_INVALID	(1UL << 16)
+#define KVM_MEMSLOT_INVALID			(1UL << 16)
+#define KVM_MEMSLOT_GMEM_ONLY			(1UL << 17)
 
 /*
  * Bit 63 of the memslot generation number is an "update in-progress flag",
@@ -2536,6 +2537,14 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
 		vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
 }
 
+static inline bool kvm_memslot_is_gmem_only(const struct kvm_memory_slot *slot)
+{
+	if (!IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP))
+		return false;
+
+	return slot->flags & KVM_MEMSLOT_GMEM_ONLY;
+}
+
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
 {
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 07a4b165471d..2b00f8796a15 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -592,6 +592,8 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
 	 */
 	WRITE_ONCE(slot->gmem.file, file);
 	slot->gmem.pgoff = start;
+	if (kvm_gmem_supports_mmap(inode))
+		slot->flags |= KVM_MEMSLOT_GMEM_ONLY;
 
 	xa_store_range(&gmem->bindings, start, end - 1, slot, GFP_KERNEL);
 	filemap_invalidate_unlock(inode->i_mapping);
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 10/21] KVM: x86/mmu: Generalize private_max_mapping_level x86 op to max_mapping_level
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (8 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 09/21] KVM: guest_memfd: Track guest_memfd mmap support in memslot Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-18  6:19   ` Xiaoyao Li
  2025-07-21 19:46   ` Sean Christopherson
  2025-07-17 16:27 ` [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level Fuad Tabba
                   ` (10 subsequent siblings)
  20 siblings, 2 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

From: Ackerley Tng <ackerleytng@google.com>

Generalize the private_max_mapping_level x86 operation to
max_mapping_level.

The private_max_mapping_level operation allows platform-specific code to
limit mapping levels (e.g., forcing 4K pages for certain memory types).
While it was previously used exclusively for private memory, guest_memfd
can now back both private and non-private memory. Platforms may have
specific mapping level restrictions that apply to guest_memfd memory
regardless of its privacy attribute. Therefore, generalize this
operation.

Rename the operation: Removes the "private" prefix to reflect its
broader applicability to any guest_memfd-backed memory.

Pass kvm_page_fault information: The operation is updated to receive a
struct kvm_page_fault object instead of just the pfn. This provides
platform-specific implementations (e.g., for TDX or SEV) with additional
context about the fault, such as whether it is private or shared,
allowing them to apply different mapping level rules as needed.

Enforce "private-only" behavior (for now): Since the current consumers
of this hook (TDX and SEV) still primarily use it to enforce private
memory constraints, platform-specific implementations are made to return
0 for non-private pages. A return value of 0 signals to callers that
platform-specific input should be ignored for that particular fault,
indicating no specific platform-imposed mapping level limits for
non-private pages. This allows the core MMU to continue determining the
mapping level based on generic rules for such cases.

Acked-by: David Hildenbrand <david@redhat.com>
Suggested-by: Sean Christoperson <seanjc@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 +-
 arch/x86/include/asm/kvm_host.h    |  2 +-
 arch/x86/kvm/mmu/mmu.c             | 11 ++++++-----
 arch/x86/kvm/svm/sev.c             |  8 ++++++--
 arch/x86/kvm/svm/svm.c             |  2 +-
 arch/x86/kvm/svm/svm.h             |  4 ++--
 arch/x86/kvm/vmx/main.c            |  6 +++---
 arch/x86/kvm/vmx/tdx.c             |  5 ++++-
 arch/x86/kvm/vmx/x86_ops.h         |  2 +-
 9 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 8d50e3e0a19b..02301fbad449 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -146,7 +146,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
-KVM_X86_OP_OPTIONAL_RET0(private_max_mapping_level)
+KVM_X86_OP_OPTIONAL_RET0(max_mapping_level)
 KVM_X86_OP_OPTIONAL(gmem_invalidate)
 
 #undef KVM_X86_OP
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 938b5be03d33..543d09fd4bca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1907,7 +1907,7 @@ struct kvm_x86_ops {
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
-	int (*private_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn);
+	int (*max_mapping_level)(struct kvm *kvm, struct kvm_page_fault *fault);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 213904daf1e5..bb925994cbc5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4467,9 +4467,11 @@ static inline u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
-static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
-					u8 max_level, int gmem_order)
+static u8 kvm_max_private_mapping_level(struct kvm *kvm,
+					struct kvm_page_fault *fault,
+					int gmem_order)
 {
+	u8 max_level = fault->max_level;
 	u8 req_max_level;
 
 	if (max_level == PG_LEVEL_4K)
@@ -4479,7 +4481,7 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
-	req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn);
+	req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
 	if (req_max_level)
 		max_level = min(max_level, req_max_level);
 
@@ -4511,8 +4513,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	}
 
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
-	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
-							 fault->max_level, max_order);
+	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault, max_order);
 
 	return RET_PF_CONTINUE;
 }
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 687392c5bf5d..dd470e26f6a0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -29,6 +29,7 @@
 #include <asm/msr.h>
 #include <asm/sev.h>
 
+#include "mmu/mmu_internal.h"
 #include "mmu.h"
 #include "x86.h"
 #include "svm.h"
@@ -4906,7 +4907,7 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
 	}
 }
 
-int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+int sev_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault)
 {
 	int level, rc;
 	bool assigned;
@@ -4914,7 +4915,10 @@ int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
 	if (!sev_snp_guest(kvm))
 		return 0;
 
-	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+	if (!fault->is_private)
+		return 0;
+
+	rc = snp_lookup_rmpentry(fault->pfn, &assigned, &level);
 	if (rc || !assigned)
 		return PG_LEVEL_4K;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d1c484eaa8ad..6ad047189210 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5347,7 +5347,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.gmem_prepare = sev_gmem_prepare,
 	.gmem_invalidate = sev_gmem_invalidate,
-	.private_max_mapping_level = sev_private_max_mapping_level,
+	.max_mapping_level = sev_max_mapping_level,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e6f3c6a153a0..c2579f7df734 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -787,7 +787,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
-int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
+int sev_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault);
 struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
 void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa);
 #else
@@ -816,7 +816,7 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
 	return 0;
 }
 static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
-static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+static inline int sev_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault)
 {
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d1e02e567b57..8e53554932ba 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -871,10 +871,10 @@ static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
 	return tdx_vcpu_ioctl(vcpu, argp);
 }
 
-static int vt_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+static int vt_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault)
 {
 	if (is_td(kvm))
-		return tdx_gmem_private_max_mapping_level(kvm, pfn);
+		return tdx_gmem_max_mapping_level(kvm, fault);
 
 	return 0;
 }
@@ -1044,7 +1044,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.mem_enc_ioctl = vt_op_tdx_only(mem_enc_ioctl),
 	.vcpu_mem_enc_ioctl = vt_op_tdx_only(vcpu_mem_enc_ioctl),
 
-	.private_max_mapping_level = vt_op_tdx_only(gmem_private_max_mapping_level)
+	.max_mapping_level = vt_op_tdx_only(gmem_max_mapping_level)
 };
 
 struct kvm_x86_init_ops vt_init_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a3db6df245ee..7f652241491a 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3322,8 +3322,11 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
 	return ret;
 }
 
-int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+int tdx_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault)
 {
+	if (!fault->is_private)
+		return 0;
+
 	return PG_LEVEL_4K;
 }
 
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index b4596f651232..ca7bc9e0fce5 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -163,7 +163,7 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
 void tdx_flush_tlb_current(struct kvm_vcpu *vcpu);
 void tdx_flush_tlb_all(struct kvm_vcpu *vcpu);
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level);
-int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
+int tdx_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault);
 #endif
 
 #endif /* __KVM_X86_VMX_X86_OPS_H */
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (9 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 10/21] KVM: x86/mmu: Generalize private_max_mapping_level x86 op to max_mapping_level Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-18  5:10   ` Xiaoyao Li
  2025-07-17 16:27 ` [PATCH v15 12/21] KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level Fuad Tabba
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

From: Ackerley Tng <ackerleytng@google.com>

Refactor kvm_max_private_mapping_level() to accept a NULL kvm_page_fault
pointer and rename it to kvm_gmem_max_mapping_level().

The max_mapping_level x86 operation (previously private_max_mapping_level)
is designed to potentially be called without an active page fault, for
instance, when kvm_mmu_max_mapping_level() is determining the maximum
mapping level for a gfn proactively.

Allow NULL fault pointer: Modify kvm_max_private_mapping_level() to
safely handle a NULL fault argument. This aligns its interface with the
kvm_x86_ops.max_mapping_level operation it wraps, which can also be
called with NULL.

Rename function to kvm_gmem_max_mapping_level(): This reinforces that
the function's scope is for guest_memfd-backed memory, which can be
either private or non-private, removing any remaining "private"
connotation from its name.

Optimize max_level checks: Introduce a check in the caller to skip
querying for max_mapping_level if the current max_level is already
PG_LEVEL_4K, as no further reduction is possible.

Acked-by: David Hildenbrand <david@redhat.com>
Suggested-by: Sean Christoperson <seanjc@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bb925994cbc5..6bd28fda0fd3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4467,17 +4467,13 @@ static inline u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
-static u8 kvm_max_private_mapping_level(struct kvm *kvm,
-					struct kvm_page_fault *fault,
-					int gmem_order)
+static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
+				     struct kvm_page_fault *fault)
 {
-	u8 max_level = fault->max_level;
 	u8 req_max_level;
+	u8 max_level;
 
-	if (max_level == PG_LEVEL_4K)
-		return PG_LEVEL_4K;
-
-	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
+	max_level = kvm_max_level_for_order(order);
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
@@ -4513,7 +4509,9 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	}
 
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
-	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault, max_order);
+	if (fault->max_level >= PG_LEVEL_4K)
+		fault->max_level = kvm_gmem_max_mapping_level(vcpu->kvm,
+							      max_order, fault);
 
 	return RET_PF_CONTINUE;
 }
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 12/21] KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (10 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-18  5:32   ` Xiaoyao Li
  2025-07-17 16:27 ` [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

From: Ackerley Tng <ackerleytng@google.com>

Modify kvm_mmu_max_mapping_level() to consult guest_memfd for memory
regions backed by it when computing the maximum mapping level,
especially during huge page recovery.

Previously, kvm_mmu_max_mapping_level() was designed primarily for
host-backed memory and private pages. With guest_memfd now supporting
non-private memory, it's necessary to factor in guest_memfd's influence
on mapping levels for such memory.

Since guest_memfd can now be used for non-private memory, make
kvm_max_max_mapping_level, when recovering huge pages, take input from
guest_memfd.

Input is taken from guest_memfd as long as a fault to that slot and gfn
would have been served from guest_memfd. For now, take a shortcut if the
slot and gfn points to memory that is private, since recovering huge
pages aren't supported for private memory yet.

Since guest_memfd memory can also be faulted into host page tables,
__kvm_mmu_max_mapping_level() still applies since consulting lpage_info
and host page tables are required.

Move functions kvm_max_level_for_order() and
kvm_gmem_max_mapping_level() so kvm_mmu_max_mapping_level() can use
those functions.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 90 ++++++++++++++++++++++++----------------
 include/linux/kvm_host.h |  7 ++++
 virt/kvm/guest_memfd.c   | 17 ++++++++
 3 files changed, 79 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6bd28fda0fd3..94be15cde6da 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3282,13 +3282,67 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 	return min(host_level, max_level);
 }
 
+static u8 kvm_max_level_for_order(int order)
+{
+	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
+
+	KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
+			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
+			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
+		return PG_LEVEL_1G;
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+		return PG_LEVEL_2M;
+
+	return PG_LEVEL_4K;
+}
+
+static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
+				     struct kvm_page_fault *fault)
+{
+	u8 req_max_level;
+	u8 max_level;
+
+	max_level = kvm_max_level_for_order(order);
+	if (max_level == PG_LEVEL_4K)
+		return PG_LEVEL_4K;
+
+	req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
+	if (req_max_level)
+		max_level = min(max_level, req_max_level);
+
+	return max_level;
+}
+
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	bool is_private = kvm_slot_has_gmem(slot) &&
 			  kvm_mem_is_private(kvm, gfn);
+	int max_level = PG_LEVEL_NUM;
+
+	/*
+	 * For now, kvm_mmu_max_mapping_level() is only called from
+	 * kvm_mmu_recover_huge_pages(), and that's not yet supported for
+	 * private memory, hence we can take a shortcut and return early.
+	 */
+	if (is_private)
+		return PG_LEVEL_4K;
 
-	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
+	/*
+	 * For non-private pages that would have been faulted from guest_memfd,
+	 * let guest_memfd influence max_mapping_level.
+	 */
+	if (kvm_memslot_is_gmem_only(slot)) {
+		int order = kvm_gmem_mapping_order(slot, gfn);
+
+		max_level = min(max_level,
+				kvm_gmem_max_mapping_level(kvm, order, NULL));
+	}
+
+	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, max_level, is_private);
 }
 
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
@@ -4450,40 +4504,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 		vcpu->stat.pf_fixed++;
 }
 
-static inline u8 kvm_max_level_for_order(int order)
-{
-	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
-
-	KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
-			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
-			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
-
-	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
-		return PG_LEVEL_1G;
-
-	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
-		return PG_LEVEL_2M;
-
-	return PG_LEVEL_4K;
-}
-
-static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
-				     struct kvm_page_fault *fault)
-{
-	u8 req_max_level;
-	u8 max_level;
-
-	max_level = kvm_max_level_for_order(order);
-	if (max_level == PG_LEVEL_4K)
-		return PG_LEVEL_4K;
-
-	req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
-	if (req_max_level)
-		max_level = min(max_level, req_max_level);
-
-	return max_level;
-}
-
 static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 				      struct kvm_page_fault *fault, int r)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d2218ec57ceb..662271314778 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2574,6 +2574,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
 		     int *max_order);
+int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
 #else
 static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn,
@@ -2583,6 +2584,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
+static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
+					 gfn_t gfn)
+{
+	WARN_ONCE(1, "Unexpected call since gmem is disabled.");
+	return 0;
+}
 #endif /* CONFIG_KVM_GMEM */
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 2b00f8796a15..d01bd7a2c2bd 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -713,6 +713,23 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
+/**
+ * kvm_gmem_mapping_order() - Get the mapping order for this @gfn in @slot.
+ *
+ * @slot: the memslot that gfn belongs to.
+ * @gfn: the gfn to look up mapping order for.
+ *
+ * This is equal to max_order that would be returned if kvm_gmem_get_pfn() were
+ * called now.
+ *
+ * Return: the mapping order for this @gfn in @slot.
+ */
+int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_gmem_mapping_order);
+
 #ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque)
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (11 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 12/21] KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-18  6:09   ` Xiaoyao Li
  2025-07-21 16:47   ` Sean Christopherson
  2025-07-17 16:27 ` [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type Fuad Tabba
                   ` (7 subsequent siblings)
  20 siblings, 2 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

From: Ackerley Tng <ackerleytng@google.com>

Update the KVM MMU fault handler to service guest page faults
for memory slots backed by guest_memfd with mmap support. For such
slots, the MMU must always fault in pages directly from guest_memfd,
bypassing the host's userspace_addr.

This ensures that guest_memfd-backed memory is always handled through
the guest_memfd specific faulting path, regardless of whether it's for
private or non-private (shared) use cases.

Additionally, rename kvm_mmu_faultin_pfn_private() to
kvm_mmu_faultin_pfn_gmem(), as this function is now used to fault in
pages from guest_memfd for both private and non-private memory,
accommodating the new use cases.

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 94be15cde6da..ad5f337b496c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4511,8 +4511,8 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 				 r == RET_PF_RETRY, fault->map_writable);
 }
 
-static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
-				       struct kvm_page_fault *fault)
+static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
+				    struct kvm_page_fault *fault)
 {
 	int max_order, r;
 
@@ -4536,13 +4536,18 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	return RET_PF_CONTINUE;
 }
 
+static bool fault_from_gmem(struct kvm_page_fault *fault)
+{
+	return fault->is_private || kvm_memslot_is_gmem_only(fault->slot);
+}
+
 static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
 				 struct kvm_page_fault *fault)
 {
 	unsigned int foll = fault->write ? FOLL_WRITE : 0;
 
-	if (fault->is_private)
-		return kvm_mmu_faultin_pfn_private(vcpu, fault);
+	if (fault_from_gmem(fault))
+		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
 
 	foll |= FOLL_NOWAIT;
 	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (12 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-18  6:10   ` Xiaoyao Li
  2025-07-21 12:22   ` Xiaoyao Li
  2025-07-17 16:27 ` [PATCH v15 15/21] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
                   ` (6 subsequent siblings)
  20 siblings, 2 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Enable host userspace mmap support for guest_memfd-backed memory when
running KVM with the KVM_X86_DEFAULT_VM type:

* Define kvm_arch_supports_gmem_mmap() for KVM_X86_DEFAULT_VM: Introduce
  the architecture-specific kvm_arch_supports_gmem_mmap() macro,
  specifically enabling mmap support for KVM_X86_DEFAULT_VM instances.
  This macro, gated by CONFIG_KVM_GMEM_SUPPORTS_MMAP, ensures that only
  the default VM type can leverage guest_memfd mmap functionality on
  x86. This explicit enablement prevents CoCo VMs, which use guest_memfd
  primarily for private memory and rely on hardware-enforced privacy,
  from accidentally exposing guest memory via host userspace mappings.

* Select CONFIG_KVM_GMEM_SUPPORTS_MMAP in KVM_X86: Enable the
  CONFIG_KVM_GMEM_SUPPORTS_MMAP Kconfig option when KVM_X86 is selected.
  This ensures that the necessary code for guest_memfd mmap support
  (introduced earlier) is compiled into the kernel for x86. This Kconfig
  option acts as a system-wide gate for the guest_memfd mmap capability.
  It implicitly enables CONFIG_KVM_GMEM, making guest_memfd available,
  and then layers the mmap capability on top specifically for the
  default VM.

These changes make guest_memfd a more versatile memory backing for
standard KVM guests, allowing VMMs to use a unified guest_memfd model
for both private (CoCo) and non-private (default) VMs. This is a
prerequisite for use cases such as running Firecracker guests entirely
backed by guest_memfd and implementing direct map removal for non-CoCo
VMs.

Acked-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 9 +++++++++
 arch/x86/kvm/Kconfig            | 1 +
 arch/x86/kvm/x86.c              | 3 ++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 543d09fd4bca..e1426adfa93e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2279,9 +2279,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 #ifdef CONFIG_KVM_GMEM
 #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
 #define kvm_arch_supports_gmem(kvm)  ((kvm)->arch.supports_gmem)
+
+/*
+ * CoCo VMs with hardware support that use guest_memfd only for backing private
+ * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
+ */
+#define kvm_arch_supports_gmem_mmap(kvm)		\
+	(IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&	\
+	 (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
 #else
 #define kvm_arch_has_private_mem(kvm) false
 #define kvm_arch_supports_gmem(kvm) false
+#define kvm_arch_supports_gmem_mmap(kvm) false
 #endif
 
 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 12e723bb76cc..4acecfb70811 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -48,6 +48,7 @@ config KVM_X86
 	select KVM_GENERIC_PRE_FAULT_MEMORY
 	select KVM_GMEM if KVM_SW_PROTECTED_VM
 	select KVM_GENERIC_MEMORY_ATTRIBUTES if KVM_SW_PROTECTED_VM
+	select KVM_GMEM_SUPPORTS_MMAP if X86_64
 	select KVM_WERROR if WERROR
 
 config KVM
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index adbdc2cc97d4..ca99187a566e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12781,7 +12781,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	kvm->arch.vm_type = type;
 	kvm->arch.has_private_mem = (type == KVM_X86_SW_PROTECTED_VM);
-	kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
+	kvm->arch.supports_gmem =
+		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
 	/* Decided by the vendor code for other VM types.  */
 	kvm->arch.pre_fault_allowed =
 		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 15/21] KVM: arm64: Refactor user_mem_abort()
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (13 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-17 16:27 ` [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Refactor user_mem_abort() to improve code clarity and simplify
assumptions within the function.

Key changes include:

* Immediately set force_pte to true at the beginning of the function if
  logging_active is true. This simplifies the flow and makes the
  condition for forcing a PTE more explicit.

* Remove the misleading comment stating that logging_active is
  guaranteed to never be true for VM_PFNMAP memslots, as this assertion
  is not entirely correct.

* Extract reusable code blocks into new helper functions:
  * prepare_mmu_memcache(): Encapsulates the logic for preparing and
    topping up the MMU page cache.
  * adjust_nested_fault_perms(): Isolates the adjustments to shadow S2
    permissions and the encoding of nested translation levels.

* Update min(a, (long)b) to min_t(long, a, b) for better type safety and
  consistency.

* Perform other minor tidying up of the code.

These changes primarily aim to simplify user_mem_abort() and make its
logic easier to understand and maintain, setting the stage for future
modifications.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 110 +++++++++++++++++++++++--------------------
 1 file changed, 59 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2942ec92c5a4..b3eacb400fab 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1470,13 +1470,56 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
+static int prepare_mmu_memcache(struct kvm_vcpu *vcpu, bool topup_memcache,
+				void **memcache)
+{
+	int min_pages;
+
+	if (!is_protected_kvm_enabled())
+		*memcache = &vcpu->arch.mmu_page_cache;
+	else
+		*memcache = &vcpu->arch.pkvm_memcache;
+
+	if (!topup_memcache)
+		return 0;
+
+	min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
+
+	if (!is_protected_kvm_enabled())
+		return kvm_mmu_topup_memory_cache(*memcache, min_pages);
+
+	return topup_hyp_memcache(*memcache, min_pages);
+}
+
+/*
+ * Potentially reduce shadow S2 permissions to match the guest's own S2. For
+ * exec faults, we'd only reach this point if the guest actually allowed it (see
+ * kvm_s2_handle_perm_fault).
+ *
+ * Also encode the level of the original translation in the SW bits of the leaf
+ * entry as a proxy for the span of that translation. This will be retrieved on
+ * TLB invalidation from the guest and used to limit the invalidation scope if a
+ * TTL hint or a range isn't provided.
+ */
+static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
+				      enum kvm_pgtable_prot *prot,
+				      bool *writable)
+{
+	*writable &= kvm_s2_trans_writable(nested);
+	if (!kvm_s2_trans_readable(nested))
+		*prot &= ~KVM_PGTABLE_PROT_R;
+
+	*prot |= kvm_encode_nested_level(nested);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  bool fault_is_perm)
 {
 	int ret = 0;
-	bool write_fault, writable, force_pte = false;
+	bool topup_memcache;
+	bool write_fault, writable;
 	bool exec_fault, mte_allowed;
 	bool device = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
@@ -1488,6 +1531,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
+	bool force_pte = logging_active;
 	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
@@ -1498,17 +1542,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
 	write_fault = kvm_is_write_fault(vcpu);
 	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
-	VM_BUG_ON(write_fault && exec_fault);
-
-	if (fault_is_perm && !write_fault && !exec_fault) {
-		kvm_err("Unexpected L2 read permission error\n");
-		return -EFAULT;
-	}
-
-	if (!is_protected_kvm_enabled())
-		memcache = &vcpu->arch.mmu_page_cache;
-	else
-		memcache = &vcpu->arch.pkvm_memcache;
+	VM_WARN_ON_ONCE(write_fault && exec_fault);
 
 	/*
 	 * Permission faults just need to update the existing leaf entry,
@@ -1516,17 +1550,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * only exception to this is when dirty logging is enabled at runtime
 	 * and a write fault needs to collapse a block entry into a table.
 	 */
-	if (!fault_is_perm || (logging_active && write_fault)) {
-		int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
-
-		if (!is_protected_kvm_enabled())
-			ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
-		else
-			ret = topup_hyp_memcache(memcache, min_pages);
-
-		if (ret)
-			return ret;
-	}
+	topup_memcache = !fault_is_perm || (logging_active && write_fault);
+	ret = prepare_mmu_memcache(vcpu, topup_memcache, &memcache);
+	if (ret)
+		return ret;
 
 	/*
 	 * Let's check if we will get back a huge page backed by hugetlbfs, or
@@ -1540,16 +1567,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 	}
 
-	/*
-	 * logging_active is guaranteed to never be true for VM_PFNMAP
-	 * memslots.
-	 */
-	if (logging_active) {
-		force_pte = true;
+	if (force_pte)
 		vma_shift = PAGE_SHIFT;
-	} else {
+	else
 		vma_shift = get_vma_page_shift(vma, hva);
-	}
 
 	switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -1601,7 +1622,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			max_map_size = PAGE_SIZE;
 
 		force_pte = (max_map_size == PAGE_SIZE);
-		vma_pagesize = min(vma_pagesize, (long)max_map_size);
+		vma_pagesize = min_t(long, vma_pagesize, max_map_size);
 	}
 
 	/*
@@ -1630,7 +1651,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
 	 * with the smp_wmb() in kvm_mmu_invalidate_end().
 	 */
-	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+	mmu_seq = kvm->mmu_invalidate_seq;
 	mmap_read_unlock(current->mm);
 
 	pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
@@ -1665,24 +1686,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault && device)
 		return -ENOEXEC;
 
-	/*
-	 * Potentially reduce shadow S2 permissions to match the guest's own
-	 * S2. For exec faults, we'd only reach this point if the guest
-	 * actually allowed it (see kvm_s2_handle_perm_fault).
-	 *
-	 * Also encode the level of the original translation in the SW bits
-	 * of the leaf entry as a proxy for the span of that translation.
-	 * This will be retrieved on TLB invalidation from the guest and
-	 * used to limit the invalidation scope if a TTL hint or a range
-	 * isn't provided.
-	 */
-	if (nested) {
-		writable &= kvm_s2_trans_writable(nested);
-		if (!kvm_s2_trans_readable(nested))
-			prot &= ~KVM_PGTABLE_PROT_R;
-
-		prot |= kvm_encode_nested_level(nested);
-	}
+	if (nested)
+		adjust_nested_fault_perms(nested, &prot, &writable);
 
 	kvm_fault_lock(kvm);
 	pgt = vcpu->arch.hw_mmu->pgt;
@@ -1953,6 +1958,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
+	VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
+			!write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
+
 	ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
 			     esr_fsc_is_permission_fault(esr));
 	if (ret == 0)
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (14 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 15/21] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-22 12:31   ` Kunwu Chan
  2025-07-23  8:26   ` Marc Zyngier
  2025-07-17 16:27 ` [PATCH v15 17/21] KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd Fuad Tabba
                   ` (4 subsequent siblings)
  20 siblings, 2 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Add arm64 architecture support for handling guest page faults on memory
slots backed by guest_memfd.

This change introduces a new function, gmem_abort(), which encapsulates
the fault handling logic specific to guest_memfd-backed memory. The
kvm_handle_guest_abort() entry point is updated to dispatch to
gmem_abort() when a fault occurs on a guest_memfd-backed memory slot (as
determined by kvm_slot_has_gmem()).

Until guest_memfd gains support for huge pages, the fault granule for
these memory regions is restricted to PAGE_SIZE.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: James Houghton <jthoughton@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 86 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 83 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b3eacb400fab..8c82df80a835 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1512,6 +1512,82 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
 	*prot |= kvm_encode_nested_level(nested);
 }
 
+#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
+
+static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+		      struct kvm_s2_trans *nested,
+		      struct kvm_memory_slot *memslot, bool is_perm)
+{
+	bool write_fault, exec_fault, writable;
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
+	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+	unsigned long mmu_seq;
+	struct page *page;
+	struct kvm *kvm = vcpu->kvm;
+	void *memcache;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+	int ret;
+
+	ret = prepare_mmu_memcache(vcpu, true, &memcache);
+	if (ret)
+		return ret;
+
+	if (nested)
+		gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
+	else
+		gfn = fault_ipa >> PAGE_SHIFT;
+
+	write_fault = kvm_is_write_fault(vcpu);
+	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
+
+	VM_WARN_ON_ONCE(write_fault && exec_fault);
+
+	mmu_seq = kvm->mmu_invalidate_seq;
+	/* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */
+	smp_rmb();
+
+	ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
+	if (ret) {
+		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
+					      write_fault, exec_fault, false);
+		return ret;
+	}
+
+	writable = !(memslot->flags & KVM_MEM_READONLY);
+
+	if (nested)
+		adjust_nested_fault_perms(nested, &prot, &writable);
+
+	if (writable)
+		prot |= KVM_PGTABLE_PROT_W;
+
+	if (exec_fault ||
+	    (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
+	     (!nested || kvm_s2_trans_executable(nested))))
+		prot |= KVM_PGTABLE_PROT_X;
+
+	kvm_fault_lock(kvm);
+	if (mmu_invalidate_retry(kvm, mmu_seq)) {
+		ret = -EAGAIN;
+		goto out_unlock;
+	}
+
+	ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
+						 __pfn_to_phys(pfn), prot,
+						 memcache, flags);
+
+out_unlock:
+	kvm_release_faultin_page(kvm, page, !!ret, writable);
+	kvm_fault_unlock(kvm);
+
+	if (writable && !ret)
+		mark_page_dirty_in_slot(kvm, memslot, gfn);
+
+	return ret != -EAGAIN ? ret : 0;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
@@ -1536,7 +1612,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 	struct page *page;
-	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
 
 	if (fault_is_perm)
 		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@@ -1961,8 +2037,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
 			!write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
 
-	ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
-			     esr_fsc_is_permission_fault(esr));
+	if (kvm_slot_has_gmem(memslot))
+		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
+				 esr_fsc_is_permission_fault(esr));
+	else
+		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
+				     esr_fsc_is_permission_fault(esr));
 	if (ret == 0)
 		ret = 1;
 out:
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 17/21] KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (15 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-23  8:29   ` Marc Zyngier
  2025-07-17 16:27 ` [PATCH v15 18/21] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Handle faults for memslots backed by guest_memfd in arm64 nested
virtualization triggerred by VNCR_EL2.

* Introduce is_gmem output parameter to kvm_translate_vncr(), indicating
  whether the faulted memory slot is backed by guest_memfd.

* Dispatch faults backed by guest_memfd to kvm_gmem_get_pfn().

* Update kvm_handle_vncr_abort() to handle potential guest_memfd errors.
  Some of the guest_memfd errors need to be handled by userspace,
  instead of attempting to (implicitly) retry by returning to the guest.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/nested.c | 41 +++++++++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index dc1d26559bfa..b3edd7f7c8cd 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -1172,8 +1172,9 @@ static u64 read_vncr_el2(struct kvm_vcpu *vcpu)
 	return (u64)sign_extend64(__vcpu_sys_reg(vcpu, VNCR_EL2), 48);
 }
 
-static int kvm_translate_vncr(struct kvm_vcpu *vcpu)
+static int kvm_translate_vncr(struct kvm_vcpu *vcpu, bool *is_gmem)
 {
+	struct kvm_memory_slot *memslot;
 	bool write_fault, writable;
 	unsigned long mmu_seq;
 	struct vncr_tlb *vt;
@@ -1216,10 +1217,25 @@ static int kvm_translate_vncr(struct kvm_vcpu *vcpu)
 	smp_rmb();
 
 	gfn = vt->wr.pa >> PAGE_SHIFT;
-	pfn = kvm_faultin_pfn(vcpu, gfn, write_fault, &writable, &page);
-	if (is_error_noslot_pfn(pfn) || (write_fault && !writable))
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot)
 		return -EFAULT;
 
+	*is_gmem = kvm_slot_has_gmem(memslot);
+	if (!*is_gmem) {
+		pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
+					&writable, &page);
+		if (is_error_noslot_pfn(pfn) || (write_fault && !writable))
+			return -EFAULT;
+	} else {
+		ret = kvm_gmem_get_pfn(vcpu->kvm, memslot, gfn, &pfn, &page, NULL);
+		if (ret) {
+			kvm_prepare_memory_fault_exit(vcpu, vt->wr.pa, PAGE_SIZE,
+					      write_fault, false, false);
+			return ret;
+		}
+	}
+
 	scoped_guard(write_lock, &vcpu->kvm->mmu_lock) {
 		if (mmu_invalidate_retry(vcpu->kvm, mmu_seq))
 			return -EAGAIN;
@@ -1292,23 +1308,36 @@ int kvm_handle_vncr_abort(struct kvm_vcpu *vcpu)
 	if (esr_fsc_is_permission_fault(esr)) {
 		inject_vncr_perm(vcpu);
 	} else if (esr_fsc_is_translation_fault(esr)) {
-		bool valid;
+		bool valid, is_gmem = false;
 		int ret;
 
 		scoped_guard(read_lock, &vcpu->kvm->mmu_lock)
 			valid = kvm_vncr_tlb_lookup(vcpu);
 
 		if (!valid)
-			ret = kvm_translate_vncr(vcpu);
+			ret = kvm_translate_vncr(vcpu, &is_gmem);
 		else
 			ret = -EPERM;
 
 		switch (ret) {
 		case -EAGAIN:
-		case -ENOMEM:
 			/* Let's try again... */
 			break;
+		case -ENOMEM:
+			/*
+			 * For guest_memfd, this indicates that it failed to
+			 * create a folio to back the memory. Inform userspace.
+			 */
+			if (is_gmem)
+				return 0;
+			/* Otherwise, let's try again... */
+			break;
 		case -EFAULT:
+		case -EIO:
+		case -EHWPOISON:
+			if (is_gmem)
+				return 0;
+			fallthrough;
 		case -EINVAL:
 		case -ENOENT:
 		case -EACCES:
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 18/21] KVM: arm64: Enable host mapping of shared guest_memfd memory
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (16 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 17/21] KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-23  8:33   ` Marc Zyngier
  2025-07-17 16:27 ` [PATCH v15 19/21] KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP Fuad Tabba
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Enable host userspace mmap support for guest_memfd-backed memory on
arm64. This change provides arm64 with the capability to map guest
memory at the host directly from guest_memfd:

* Define kvm_arch_supports_gmem_mmap() for arm64: The
  kvm_arch_supports_gmem_mmap() macro is defined for arm64 to be true if
  CONFIG_KVM_GMEM_SUPPORTS_MMAP is enabled. For existing arm64 KVM VM
  types that support guest_memfd, this enables them to use guest_memfd
  with host userspace mappings. This provides a consistent behavior as
  there are currently no arm64 CoCo VMs that rely on guest_memfd solely
  for private, non-mappable memory. Future arm64 VM types can override
  or restrict this behavior via the kvm_arch_supports_gmem_mmap() hook
  if needed.

* Select CONFIG_KVM_GMEM_SUPPORTS_MMAP in arm64 Kconfig.

* Enforce KVM_MEMSLOT_GMEM_ONLY for guest_memfd on arm64: Checks are
  added to ensure that if guest_memfd is enabled on arm64,
  KVM_GMEM_SUPPORTS_MMAP must also be enabled. This means
  guest_memfd-backed memory slots on arm64 are currently only supported
  if they are intended for shared memory use cases (i.e.,
  kvm_memslot_is_gmem_only() is true). This design reflects the current
  arm64 KVM ecosystem where guest_memfd is primarily being introduced
  for VMs that support shared memory.

Reviewed-by: James Houghton <jthoughton@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 4 ++++
 arch/arm64/kvm/Kconfig            | 2 ++
 arch/arm64/kvm/mmu.c              | 7 +++++++
 3 files changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3e41a880b062..63f7827cfa1b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1674,5 +1674,9 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt);
 void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1);
 void check_feature_map(void);
 
+#ifdef CONFIG_KVM_GMEM
+#define kvm_arch_supports_gmem(kvm) true
+#define kvm_arch_supports_gmem_mmap(kvm) IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP)
+#endif
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 713248f240e0..323b46b7c82f 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -37,6 +37,8 @@ menuconfig KVM
 	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
+	select KVM_GMEM
+	select KVM_GMEM_SUPPORTS_MMAP
 	help
 	  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8c82df80a835..85559b8a0845 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2276,6 +2276,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	if ((new->base_gfn + new->npages) > (kvm_phys_size(&kvm->arch.mmu) >> PAGE_SHIFT))
 		return -EFAULT;
 
+	/*
+	 * Only support guest_memfd backed memslots with mappable memory, since
+	 * there aren't any CoCo VMs that support only private memory on arm64.
+	 */
+	if (kvm_slot_has_gmem(new) && !kvm_memslot_is_gmem_only(new))
+		return -EINVAL;
+
 	hva = new->userspace_addr;
 	reg_end = hva + (new->npages << PAGE_SHIFT);
 
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 19/21] KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (17 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 18/21] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-18  6:14   ` Xiaoyao Li
  2025-07-21 17:31   ` Sean Christopherson
  2025-07-17 16:27 ` [PATCH v15 20/21] KVM: selftests: Do not use hardcoded page sizes in guest_memfd test Fuad Tabba
  2025-07-17 16:27 ` [PATCH v15 21/21] KVM: selftests: guest_memfd mmap() test when mmap is supported Fuad Tabba
  20 siblings, 2 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Introduce the new KVM capability KVM_CAP_GMEM_MMAP. This capability
signals to userspace that a KVM instance supports host userspace mapping
of guest_memfd-backed memory.

The availability of this capability is determined per architecture, and
its enablement for a specific guest_memfd instance is controlled by the
GUEST_MEMFD_FLAG_MMAP flag at creation time.

Update the KVM API documentation to detail the KVM_CAP_GMEM_MMAP
capability, the associated GUEST_MEMFD_FLAG_MMAP, and provide essential
information regarding support for mmap in guest_memfd.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 Documentation/virt/kvm/api.rst | 9 +++++++++
 include/uapi/linux/kvm.h       | 1 +
 virt/kvm/kvm_main.c            | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 43ed57e048a8..5169066b53b2 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6407,6 +6407,15 @@ most one mapping per page, i.e. binding multiple memory regions to a single
 guest_memfd range is not allowed (any number of memory regions can be bound to
 a single guest_memfd file, but the bound ranges must not overlap).
 
+When the capability KVM_CAP_GMEM_MMAP is supported, the 'flags' field supports
+GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation enables mmap()
+and faulting of guest_memfd memory to host userspace.
+
+When the KVM MMU performs a PFN lookup to service a guest fault and the backing
+guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
+consumed from guest_memfd, regardless of whether it is a shared or a private
+fault.
+
 See KVM_SET_USER_MEMORY_REGION2 for additional details.
 
 4.143 KVM_PRE_FAULT_MEMORY
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3beafbf306af..698dd407980f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -960,6 +960,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_ARM_EL2 240
 #define KVM_CAP_ARM_EL2_E2H0 241
 #define KVM_CAP_RISCV_MP_STATE_RESET 242
+#define KVM_CAP_GMEM_MMAP 243
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 46bddac1dacd..f1ac872e01e9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4916,6 +4916,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_supports_gmem(kvm);
+#endif
+#ifdef CONFIG_KVM_GMEM_SUPPORTS_MMAP
+	case KVM_CAP_GMEM_MMAP:
+		return !kvm || kvm_arch_supports_gmem_mmap(kvm);
 #endif
 	default:
 		break;
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 20/21] KVM: selftests: Do not use hardcoded page sizes in guest_memfd test
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (18 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 19/21] KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  2025-07-17 16:27 ` [PATCH v15 21/21] KVM: selftests: guest_memfd mmap() test when mmap is supported Fuad Tabba
  20 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Update the guest_memfd_test selftest to use getpagesize() instead of
hardcoded 4KB page size values.

Using hardcoded page sizes can cause test failures on architectures or
systems configured with larger page sizes, such as arm64 with 64KB
pages. By dynamically querying the system's page size, the test becomes
more portable and robust across different environments.

Additionally, build the guest_memfd_test selftest for arm64.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Suggested-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm       |  1 +
 tools/testing/selftests/kvm/guest_memfd_test.c | 11 ++++++-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 38b95998e1e6..e11ed9e59ab5 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -172,6 +172,7 @@ TEST_GEN_PROGS_arm64 += arch_timer
 TEST_GEN_PROGS_arm64 += coalesced_io_test
 TEST_GEN_PROGS_arm64 += dirty_log_perf_test
 TEST_GEN_PROGS_arm64 += get-reg-list
+TEST_GEN_PROGS_arm64 += guest_memfd_test
 TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
 TEST_GEN_PROGS_arm64 += memslot_perf_test
 TEST_GEN_PROGS_arm64 += mmu_stress_test
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index ce687f8d248f..341ba616cf55 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -146,24 +146,25 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 {
 	int fd1, fd2, ret;
 	struct stat st1, st2;
+	size_t page_size = getpagesize();
 
-	fd1 = __vm_create_guest_memfd(vm, 4096, 0);
+	fd1 = __vm_create_guest_memfd(vm, page_size, 0);
 	TEST_ASSERT(fd1 != -1, "memfd creation should succeed");
 
 	ret = fstat(fd1, &st1);
 	TEST_ASSERT(ret != -1, "memfd fstat should succeed");
-	TEST_ASSERT(st1.st_size == 4096, "memfd st_size should match requested size");
+	TEST_ASSERT(st1.st_size == page_size, "memfd st_size should match requested size");
 
-	fd2 = __vm_create_guest_memfd(vm, 8192, 0);
+	fd2 = __vm_create_guest_memfd(vm, page_size * 2, 0);
 	TEST_ASSERT(fd2 != -1, "memfd creation should succeed");
 
 	ret = fstat(fd2, &st2);
 	TEST_ASSERT(ret != -1, "memfd fstat should succeed");
-	TEST_ASSERT(st2.st_size == 8192, "second memfd st_size should match requested size");
+	TEST_ASSERT(st2.st_size == page_size * 2, "second memfd st_size should match requested size");
 
 	ret = fstat(fd1, &st1);
 	TEST_ASSERT(ret != -1, "memfd fstat should succeed");
-	TEST_ASSERT(st1.st_size == 4096, "first memfd st_size should still match requested size");
+	TEST_ASSERT(st1.st_size == page_size, "first memfd st_size should still match requested size");
 	TEST_ASSERT(st1.st_ino != st2.st_ino, "different memfd should have different inode numbers");
 
 	close(fd2);
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v15 21/21] KVM: selftests: guest_memfd mmap() test when mmap is supported
  2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
                   ` (19 preceding siblings ...)
  2025-07-17 16:27 ` [PATCH v15 20/21] KVM: selftests: Do not use hardcoded page sizes in guest_memfd test Fuad Tabba
@ 2025-07-17 16:27 ` Fuad Tabba
  20 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-17 16:27 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny, tabba

Expand the guest_memfd selftests to comprehensively test host userspace
mmap functionality for guest_memfd-backed memory when supported by the
VM type.

Introduce new test cases to verify the following:

* Successful mmap operations: Ensure that MAP_SHARED mappings succeed
  when guest_memfd mmap is enabled.

* Data integrity: Validate that data written to the mmap'd region is
  correctly persistent and readable.

* fallocate interaction: Test that fallocate(FALLOC_FL_PUNCH_HOLE)
  correctly zeros out mapped pages.

* Out-of-bounds access: Verify that accessing memory beyond the
  guest_memfd's size correctly triggers a SIGBUS signal.

* Unsupported mmap: Confirm that mmap attempts fail as expected when
  guest_memfd mmap support is not enabled for the specific guest_memfd
  instance or VM type.

* Flag validity: Introduce test_vm_type_gmem_flag_validity() to
  systematically test that only allowed guest_memfd creation flags are
  accepted for different VM types (e.g., GUEST_MEMFD_FLAG_MMAP for
  default VMs, no flags for CoCo VMs).

The existing tests for guest_memfd creation (multiple instances, invalid
sizes), file read/write, file size, and invalid punch hole operations
are integrated into the new test_with_type() framework to allow testing
across different VM types.

Reviewed-by: James Houghton <jthoughton@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 197 ++++++++++++++++--
 1 file changed, 176 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 341ba616cf55..1252e74fbb8f 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -13,6 +13,8 @@
 
 #include <linux/bitmap.h>
 #include <linux/falloc.h>
+#include <setjmp.h>
+#include <signal.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -34,12 +36,83 @@ static void test_file_read_write(int fd)
 		    "pwrite on a guest_mem fd should fail");
 }
 
-static void test_mmap(int fd, size_t page_size)
+static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
+{
+	const char val = 0xaa;
+	char *mem;
+	size_t i;
+	int ret;
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+	TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmap() for guest_memfd should succeed.");
+
+	memset(mem, val, total_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
+			page_size);
+	TEST_ASSERT(!ret, "fallocate the first page should succeed.");
+
+	for (i = 0; i < page_size; i++)
+		TEST_ASSERT_EQ(READ_ONCE(mem[i]), 0x00);
+	for (; i < total_size; i++)
+		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
+
+	memset(mem, val, page_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
+
+	ret = munmap(mem, total_size);
+	TEST_ASSERT(!ret, "munmap() should succeed.");
+}
+
+static sigjmp_buf jmpbuf;
+void fault_sigbus_handler(int signum)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static void test_fault_overflow(int fd, size_t page_size, size_t total_size)
+{
+	struct sigaction sa_old, sa_new = {
+		.sa_handler = fault_sigbus_handler,
+	};
+	size_t map_size = total_size * 4;
+	const char val = 0xaa;
+	char *mem;
+	size_t i;
+	int ret;
+
+	mem = mmap(NULL, map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmap() for guest_memfd should succeed.");
+
+	sigaction(SIGBUS, &sa_new, &sa_old);
+	if (sigsetjmp(jmpbuf, 1) == 0) {
+		memset(mem, 0xaa, map_size);
+		TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
+	}
+	sigaction(SIGBUS, &sa_old, NULL);
+
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
+
+	ret = munmap(mem, map_size);
+	TEST_ASSERT(!ret, "munmap() should succeed.");
+}
+
+static void test_mmap_not_supported(int fd, size_t page_size, size_t total_size)
 {
 	char *mem;
 
 	mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 	TEST_ASSERT_EQ(mem, MAP_FAILED);
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT_EQ(mem, MAP_FAILED);
 }
 
 static void test_file_size(int fd, size_t page_size, size_t total_size)
@@ -120,26 +193,19 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
 	}
 }
 
-static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
+static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
+						  uint64_t guest_memfd_flags,
+						  size_t page_size)
 {
-	size_t page_size = getpagesize();
-	uint64_t flag;
 	size_t size;
 	int fd;
 
 	for (size = 1; size < page_size; size++) {
-		fd = __vm_create_guest_memfd(vm, size, 0);
-		TEST_ASSERT(fd == -1 && errno == EINVAL,
+		fd = __vm_create_guest_memfd(vm, size, guest_memfd_flags);
+		TEST_ASSERT(fd < 0 && errno == EINVAL,
 			    "guest_memfd() with non-page-aligned page size '0x%lx' should fail with EINVAL",
 			    size);
 	}
-
-	for (flag = BIT(0); flag; flag <<= 1) {
-		fd = __vm_create_guest_memfd(vm, page_size, flag);
-		TEST_ASSERT(fd == -1 && errno == EINVAL,
-			    "guest_memfd() with flag '0x%lx' should fail with EINVAL",
-			    flag);
-	}
 }
 
 static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
@@ -171,30 +237,119 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 	close(fd1);
 }
 
-int main(int argc, char *argv[])
+static bool check_vm_type(unsigned long vm_type)
 {
-	size_t page_size;
+	/*
+	 * Not all architectures support KVM_CAP_VM_TYPES. However, those that
+	 * support guest_memfd have that support for the default VM type.
+	 */
+	if (vm_type == VM_TYPE_DEFAULT)
+		return true;
+
+	return kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type);
+}
+
+static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
+			   bool expect_mmap_allowed)
+{
+	struct kvm_vm *vm;
 	size_t total_size;
+	size_t page_size;
 	int fd;
-	struct kvm_vm *vm;
 
-	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+	if (!check_vm_type(vm_type))
+		return;
 
 	page_size = getpagesize();
 	total_size = page_size * 4;
 
-	vm = vm_create_barebones();
+	vm = vm_create_barebones_type(vm_type);
 
-	test_create_guest_memfd_invalid(vm);
 	test_create_guest_memfd_multiple(vm);
+	test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size);
 
-	fd = vm_create_guest_memfd(vm, total_size, 0);
+	fd = vm_create_guest_memfd(vm, total_size, guest_memfd_flags);
 
 	test_file_read_write(fd);
-	test_mmap(fd, page_size);
+
+	if (expect_mmap_allowed) {
+		test_mmap_supported(fd, page_size, total_size);
+		test_fault_overflow(fd, page_size, total_size);
+
+	} else {
+		test_mmap_not_supported(fd, page_size, total_size);
+	}
+
 	test_file_size(fd, page_size, total_size);
 	test_fallocate(fd, page_size, total_size);
 	test_invalid_punch_hole(fd, page_size, total_size);
 
 	close(fd);
+	kvm_vm_free(vm);
+}
+
+static void test_vm_type_gmem_flag_validity(unsigned long vm_type,
+					    uint64_t expected_valid_flags)
+{
+	size_t page_size = getpagesize();
+	struct kvm_vm *vm;
+	uint64_t flag = 0;
+	int fd;
+
+	if (!check_vm_type(vm_type))
+		return;
+
+	vm = vm_create_barebones_type(vm_type);
+
+	for (flag = BIT(0); flag; flag <<= 1) {
+		fd = __vm_create_guest_memfd(vm, page_size, flag);
+
+		if (flag & expected_valid_flags) {
+			TEST_ASSERT(fd >= 0,
+				    "guest_memfd() with flag '0x%lx' should be valid",
+				    flag);
+			close(fd);
+		} else {
+			TEST_ASSERT(fd < 0 && errno == EINVAL,
+				    "guest_memfd() with flag '0x%lx' should fail with EINVAL",
+				    flag);
+		}
+	}
+
+	kvm_vm_free(vm);
+}
+
+static void test_gmem_flag_validity(void)
+{
+	uint64_t non_coco_vm_valid_flags = 0;
+
+	if (kvm_has_cap(KVM_CAP_GMEM_MMAP))
+		non_coco_vm_valid_flags = GUEST_MEMFD_FLAG_MMAP;
+
+	test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, non_coco_vm_valid_flags);
+
+#ifdef __x86_64__
+	test_vm_type_gmem_flag_validity(KVM_X86_SW_PROTECTED_VM, 0);
+	test_vm_type_gmem_flag_validity(KVM_X86_SEV_VM, 0);
+	test_vm_type_gmem_flag_validity(KVM_X86_SEV_ES_VM, 0);
+	test_vm_type_gmem_flag_validity(KVM_X86_SNP_VM, 0);
+	test_vm_type_gmem_flag_validity(KVM_X86_TDX_VM, 0);
+#endif
+}
+
+int main(int argc, char *argv[])
+{
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+
+	test_gmem_flag_validity();
+
+	test_with_type(VM_TYPE_DEFAULT, 0, false);
+	if (kvm_has_cap(KVM_CAP_GMEM_MMAP)) {
+		test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_MMAP,
+			       true);
+	}
+
+#ifdef __x86_64__
+	test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
+#endif
 }
-- 
2.50.0.727.gbf7dc18ff4-goog



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem()
  2025-07-17 16:27 ` [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem() Fuad Tabba
@ 2025-07-18  1:42   ` Xiaoyao Li
  2025-07-21 14:47     ` Sean Christopherson
  2025-07-21 14:55     ` Fuad Tabba
  2025-07-21 16:44   ` Sean Christopherson
  1 sibling, 2 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-18  1:42 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> -/* SMM is currently unsupported for guests with private memory. */
> +/* SMM is currently unsupported for guests with guest_memfd private memory. */
>   # define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 2)

As I commented in the v14, please don't change the comment.

It is checking kvm_arch_has_private_mem(), *not* 
kvm_arch_supports_gmem(). So why bother mentioning guest_memfd here?



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 08/21] KVM: guest_memfd: Allow host to map guest_memfd pages
  2025-07-17 16:27 ` [PATCH v15 08/21] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
@ 2025-07-18  2:56   ` Xiaoyao Li
  0 siblings, 0 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-18  2:56 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 1ec71648824c..9ac21985f3b5 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -740,6 +740,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>   }
>   #endif
>   
> +/*
> + * Returns true if this VM supports mmap() in guest_memfd.
> + *
> + * Arch code must define kvm_arch_supports_gmem_mmap if support for guest_memfd
> + * is enabled.
> + */
> +#if !defined(kvm_arch_supports_gmem_mmap)

Unless being given a reason or you change it to

#if !defined(kvm_arch_supports_gmem_mmap) && !IS_ENABLED(CONFIG_KVM_GMEM)

I will repeat my opinion as what on v14[*]

* It describes the similar requirement as kvm_arch_has_private_mem and
* kvm_arch_supports_gmem, but it doesn't have the check of
*
*	&& !IS_ENABLED(CONFIG_KVM_GMEM)
*
* So it's straightforward for people to wonder why.
*
* I would suggest just adding the check of !IS_ENABLED(CONFIG_KVM_GMEM)
* like what for kvm_arch_has_private_mem and kvm_arch_supports_gmem.
* So it will get compilation error if any ARCH enables CONFIG_KVM_GMEM
* without defining kvm_arch_supports_gmem_mmap.

[*] 
https://lore.kernel.org/all/e1470c54-fe2b-4fdf-9b4b-ce9ef0d04a1b@intel.com/
  > +static inline bool kvm_arch_supports_gmem_mmap(struct kvm *kvm)
> +{
> +	return false;
> +}
> +#endif
> +



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 09/21] KVM: guest_memfd: Track guest_memfd mmap support in memslot
  2025-07-17 16:27 ` [PATCH v15 09/21] KVM: guest_memfd: Track guest_memfd mmap support in memslot Fuad Tabba
@ 2025-07-18  3:33   ` Xiaoyao Li
  0 siblings, 0 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-18  3:33 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> Add a new internal flag, KVM_MEMSLOT_GMEM_ONLY, to the top half of
> memslot->flags (which makes it strictly for KVM's internal use). This
> flag tracks when a guest_memfd-backed memory slot supports host
> userspace mmap operations, which implies that all memory, not just
> private memory for CoCo VMs, is consumed through guest_memfd: "gmem
> only".
> 
> This optimization avoids repeatedly checking the underlying guest_memfd
> file for mmap support, which would otherwise require taking and
> releasing a reference on the file for each check. By caching this
> information directly in the memslot, we reduce overhead and simplify the
> logic involved in handling guest_memfd-backed pages for host mappings.
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level
  2025-07-17 16:27 ` [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level Fuad Tabba
@ 2025-07-18  5:10   ` Xiaoyao Li
  2025-07-21 23:17     ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-18  5:10 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> Refactor kvm_max_private_mapping_level() to accept a NULL kvm_page_fault
> pointer and rename it to kvm_gmem_max_mapping_level().
> 
> The max_mapping_level x86 operation (previously private_max_mapping_level)
> is designed to potentially be called without an active page fault, for
> instance, when kvm_mmu_max_mapping_level() is determining the maximum
> mapping level for a gfn proactively.
> 
> Allow NULL fault pointer: Modify kvm_max_private_mapping_level() to
> safely handle a NULL fault argument. This aligns its interface with the
> kvm_x86_ops.max_mapping_level operation it wraps, which can also be
> called with NULL.

are you sure of it?

The patch 09 just added the check of fault->is_private for TDX and SEV.

> Rename function to kvm_gmem_max_mapping_level(): This reinforces that
> the function's scope is for guest_memfd-backed memory, which can be
> either private or non-private, removing any remaining "private"
> connotation from its name.
> 
> Optimize max_level checks: Introduce a check in the caller to skip
> querying for max_mapping_level if the current max_level is already
> PG_LEVEL_4K, as no further reduction is possible.
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Sean Christoperson <seanjc@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c | 16 +++++++---------
>   1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index bb925994cbc5..6bd28fda0fd3 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4467,17 +4467,13 @@ static inline u8 kvm_max_level_for_order(int order)
>   	return PG_LEVEL_4K;
>   }
>   
> -static u8 kvm_max_private_mapping_level(struct kvm *kvm,
> -					struct kvm_page_fault *fault,
> -					int gmem_order)
> +static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
> +				     struct kvm_page_fault *fault)
>   {
> -	u8 max_level = fault->max_level;
>   	u8 req_max_level;
> +	u8 max_level;
>   
> -	if (max_level == PG_LEVEL_4K)
> -		return PG_LEVEL_4K;
> -
> -	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
> +	max_level = kvm_max_level_for_order(order);
>   	if (max_level == PG_LEVEL_4K)
>   		return PG_LEVEL_4K;
>   
> @@ -4513,7 +4509,9 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
>   	}
>   
>   	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> -	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault, max_order);
> +	if (fault->max_level >= PG_LEVEL_4K)
> +		fault->max_level = kvm_gmem_max_mapping_level(vcpu->kvm,
> +							      max_order, fault);

I cannot understand why this change is required. In what case will 
fault->max_level < PG_LEVEL_4K?


>   	return RET_PF_CONTINUE;
>   }



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 12/21] KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level
  2025-07-17 16:27 ` [PATCH v15 12/21] KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level Fuad Tabba
@ 2025-07-18  5:32   ` Xiaoyao Li
  2025-07-18  5:57     ` Xiaoyao Li
  0 siblings, 1 reply; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-18  5:32 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> Modify kvm_mmu_max_mapping_level() to consult guest_memfd for memory
> regions backed by it when computing the maximum mapping level,
> especially during huge page recovery.

IMHO, we need integrate the consultation of guest_memfd into 
__kvm_mmu_max_mapping_level, not kvm_mmu_max_mapping_level().

__kvm_mmu_max_mapping_level() (called by kvm_mmu_hugepage_adjust()) is 
the function KVM X86 uses to determine the final mapping level,
fault->goal_level.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 12/21] KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level
  2025-07-18  5:32   ` Xiaoyao Li
@ 2025-07-18  5:57     ` Xiaoyao Li
  0 siblings, 0 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-18  5:57 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 1:32 PM, Xiaoyao Li wrote:
> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
>> From: Ackerley Tng <ackerleytng@google.com>
>>
>> Modify kvm_mmu_max_mapping_level() to consult guest_memfd for memory
>> regions backed by it when computing the maximum mapping level,
>> especially during huge page recovery.
> 
> IMHO, we need integrate the consultation of guest_memfd into 
> __kvm_mmu_max_mapping_level, not kvm_mmu_max_mapping_level().
> 
> __kvm_mmu_max_mapping_level() (called by kvm_mmu_hugepage_adjust()) is 
> the function KVM X86 uses to determine the final mapping level,
> fault->goal_level.
> 

I think I can understand the patch now.

For normal TDP page fault that requires KVM to setup the TDP page table 
to map the guest memory. The max page level of guest memfd is consulted 
when faulting in the pfn in kvm_mmu_faultin_pfn_private() and update 
fault->max_level accordingly. So skip consultation in 
__kvm_mmu_max_mapping_level() is OK.

But for recover_huge_pages_range() and kvm_mmu_zap_collapsible_spte() 
(this patch misses this case) which call kvm_mmu_max_mapping_level() and 
without information of the max page level of guest memfd. So we need to 
consult guest memfd separately.

But the changelog doesn't clarify it such way.



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-07-17 16:27 ` [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
@ 2025-07-18  6:09   ` Xiaoyao Li
  2025-07-21 16:47   ` Sean Christopherson
  1 sibling, 0 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-18  6:09 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> Update the KVM MMU fault handler to service guest page faults
> for memory slots backed by guest_memfd with mmap support. For such
> slots, the MMU must always fault in pages directly from guest_memfd,
> bypassing the host's userspace_addr.
> 
> This ensures that guest_memfd-backed memory is always handled through
> the guest_memfd specific faulting path, regardless of whether it's for
> private or non-private (shared) use cases.
> 
> Additionally, rename kvm_mmu_faultin_pfn_private() to
> kvm_mmu_faultin_pfn_gmem(), as this function is now used to fault in
> pages from guest_memfd for both private and non-private memory,
> accommodating the new use cases.
> 
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>


Note to myself:

After this patch, it looks possible that 
kvm_mmu_prepare_memory_fault_exit() in kvm_mmu_faultin_pfn_gmem() might 
be triggered for guest_memfd with mmap support, though I'm not sure if 
there is real case to trigger it.

This requires some change in QEMU when it adds support for guest_memfd 
mmap support, since current QEMU handles KVM_EXIT_MEMORY_FAULT by always 
converting the memory attribute.

> ---
>   arch/x86/kvm/mmu/mmu.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 94be15cde6da..ad5f337b496c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4511,8 +4511,8 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
>   				 r == RET_PF_RETRY, fault->map_writable);
>   }
>   
> -static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> -				       struct kvm_page_fault *fault)
> +static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> +				    struct kvm_page_fault *fault)
>   {
>   	int max_order, r;
>   
> @@ -4536,13 +4536,18 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
>   	return RET_PF_CONTINUE;
>   }
>   
> +static bool fault_from_gmem(struct kvm_page_fault *fault)
> +{
> +	return fault->is_private || kvm_memslot_is_gmem_only(fault->slot);
> +}
> +
>   static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>   				 struct kvm_page_fault *fault)
>   {
>   	unsigned int foll = fault->write ? FOLL_WRITE : 0;
>   
> -	if (fault->is_private)
> -		return kvm_mmu_faultin_pfn_private(vcpu, fault);
> +	if (fault_from_gmem(fault))
> +		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>   
>   	foll |= FOLL_NOWAIT;
>   	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-17 16:27 ` [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type Fuad Tabba
@ 2025-07-18  6:10   ` Xiaoyao Li
  2025-07-21 12:22   ` Xiaoyao Li
  1 sibling, 0 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-18  6:10 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> Enable host userspace mmap support for guest_memfd-backed memory when
> running KVM with the KVM_X86_DEFAULT_VM type:
> 
> * Define kvm_arch_supports_gmem_mmap() for KVM_X86_DEFAULT_VM: Introduce
>    the architecture-specific kvm_arch_supports_gmem_mmap() macro,
>    specifically enabling mmap support for KVM_X86_DEFAULT_VM instances.
>    This macro, gated by CONFIG_KVM_GMEM_SUPPORTS_MMAP, ensures that only
>    the default VM type can leverage guest_memfd mmap functionality on
>    x86. This explicit enablement prevents CoCo VMs, which use guest_memfd
>    primarily for private memory and rely on hardware-enforced privacy,
>    from accidentally exposing guest memory via host userspace mappings.
> 
> * Select CONFIG_KVM_GMEM_SUPPORTS_MMAP in KVM_X86: Enable the
>    CONFIG_KVM_GMEM_SUPPORTS_MMAP Kconfig option when KVM_X86 is selected.
>    This ensures that the necessary code for guest_memfd mmap support
>    (introduced earlier) is compiled into the kernel for x86. This Kconfig
>    option acts as a system-wide gate for the guest_memfd mmap capability.
>    It implicitly enables CONFIG_KVM_GMEM, making guest_memfd available,
>    and then layers the mmap capability on top specifically for the
>    default VM.
> 
> These changes make guest_memfd a more versatile memory backing for
> standard KVM guests, allowing VMMs to use a unified guest_memfd model
> for both private (CoCo) and non-private (default) VMs. This is a
> prerequisite for use cases such as running Firecracker guests entirely
> backed by guest_memfd and implementing direct map removal for non-CoCo
> VMs.
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 19/21] KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP
  2025-07-17 16:27 ` [PATCH v15 19/21] KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP Fuad Tabba
@ 2025-07-18  6:14   ` Xiaoyao Li
  2025-07-21 17:31   ` Sean Christopherson
  1 sibling, 0 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-18  6:14 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> Introduce the new KVM capability KVM_CAP_GMEM_MMAP. This capability
> signals to userspace that a KVM instance supports host userspace mapping
> of guest_memfd-backed memory.
> 
> The availability of this capability is determined per architecture, and
> its enablement for a specific guest_memfd instance is controlled by the
> GUEST_MEMFD_FLAG_MMAP flag at creation time.
> 
> Update the KVM API documentation to detail the KVM_CAP_GMEM_MMAP
> capability, the associated GUEST_MEMFD_FLAG_MMAP, and provide essential
> information regarding support for mmap in guest_memfd.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

Though I have comments on some patches, the general functionality for 
x86 seems to be work. I plan to do a POC with QEMU to test non-coco VM 
with guest_memfd with mmap support as the memory backend.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 10/21] KVM: x86/mmu: Generalize private_max_mapping_level x86 op to max_mapping_level
  2025-07-17 16:27 ` [PATCH v15 10/21] KVM: x86/mmu: Generalize private_max_mapping_level x86 op to max_mapping_level Fuad Tabba
@ 2025-07-18  6:19   ` Xiaoyao Li
  2025-07-21 19:46   ` Sean Christopherson
  1 sibling, 0 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-18  6:19 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> Generalize the private_max_mapping_level x86 operation to
> max_mapping_level.
> 
> The private_max_mapping_level operation allows platform-specific code to
> limit mapping levels (e.g., forcing 4K pages for certain memory types).
> While it was previously used exclusively for private memory, guest_memfd
> can now back both private and non-private memory. Platforms may have
> specific mapping level restrictions that apply to guest_memfd memory
> regardless of its privacy attribute. Therefore, generalize this
> operation.
> 
> Rename the operation: Removes the "private" prefix to reflect its
> broader applicability to any guest_memfd-backed memory.
> 
> Pass kvm_page_fault information: The operation is updated to receive a
> struct kvm_page_fault object instead of just the pfn. This provides
> platform-specific implementations (e.g., for TDX or SEV) with additional
> context about the fault, such as whether it is private or shared,
> allowing them to apply different mapping level rules as needed.
> 
> Enforce "private-only" behavior (for now): Since the current consumers
> of this hook (TDX and SEV) still primarily use it to enforce private
> memory constraints, platform-specific implementations are made to return
> 0 for non-private pages. A return value of 0 signals to callers that
> platform-specific input should be ignored for that particular fault,
> indicating no specific platform-imposed mapping level limits for
> non-private pages. This allows the core MMU to continue determining the
> mapping level based on generic rules for such cases.
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Sean Christoperson <seanjc@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-17 16:27 ` [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type Fuad Tabba
  2025-07-18  6:10   ` Xiaoyao Li
@ 2025-07-21 12:22   ` Xiaoyao Li
  2025-07-21 12:41     ` Fuad Tabba
                       ` (2 more replies)
  1 sibling, 3 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-21 12:22 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> +/*
> + * CoCo VMs with hardware support that use guest_memfd only for backing private
> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> + */
> +#define kvm_arch_supports_gmem_mmap(kvm)		\
> +	(IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&	\
> +	 (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)

I want to share the findings when I do the POC to enable gmem mmap in QEMU.

Actually, QEMU can use gmem with mmap support as the normal memory even 
without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd 
on KVM_SET_USER_MEMORY_REGION2.

Since the gmem is mmapable, QEMU can pass the userspace addr got from 
mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It 
works well for non-coco VMs on x86.

Then it seems feasible to use gmem with mmap for the shared memory of 
TDX, and an additional gmem without mmap for the private memory. i.e.,
For struct kvm_userspace_memory_region, the @userspace_addr is passed 
with the uaddr returned from gmem0 with mmap, while @guest_memfd is 
passed with another gmem1 fd without mmap.

However, it fails actually, because the kvm_arch_suports_gmem_mmap() 
returns false for TDX VMs, which means userspace cannot allocate gmem 
with mmap just for shared memory for TDX.

SO my question is do we want to support such case?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 12:22   ` Xiaoyao Li
@ 2025-07-21 12:41     ` Fuad Tabba
  2025-07-21 13:45     ` Vishal Annapurve
  2025-07-22 14:28     ` Xiaoyao Li
  2 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-21 12:41 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi Xiaoyao,

On Mon, 21 Jul 2025 at 13:22, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > +/*
> > + * CoCo VMs with hardware support that use guest_memfd only for backing private
> > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> > + */
> > +#define kvm_arch_supports_gmem_mmap(kvm)             \
> > +     (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&   \
> > +      (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
>
> I want to share the findings when I do the POC to enable gmem mmap in QEMU.
>
> Actually, QEMU can use gmem with mmap support as the normal memory even
> without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
> on KVM_SET_USER_MEMORY_REGION2.
>
> Since the gmem is mmapable, QEMU can pass the userspace addr got from
> mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
> works well for non-coco VMs on x86.
>
> Then it seems feasible to use gmem with mmap for the shared memory of
> TDX, and an additional gmem without mmap for the private memory. i.e.,
> For struct kvm_userspace_memory_region, the @userspace_addr is passed
> with the uaddr returned from gmem0 with mmap, while @guest_memfd is
> passed with another gmem1 fd without mmap.
>
> However, it fails actually, because the kvm_arch_suports_gmem_mmap()
> returns false for TDX VMs, which means userspace cannot allocate gmem
> with mmap just for shared memory for TDX.
>
> SO my question is do we want to support such case?

Thanks for sharing this. To answer your question, no, we explicitly do
not want to support this feature for TDX, since TDX uses a completely
different paradigm.

Cheers,
/fuad


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 12:22   ` Xiaoyao Li
  2025-07-21 12:41     ` Fuad Tabba
@ 2025-07-21 13:45     ` Vishal Annapurve
  2025-07-21 14:42       ` Xiaoyao Li
  2025-07-21 14:42       ` Sean Christopherson
  2025-07-22 14:28     ` Xiaoyao Li
  2 siblings, 2 replies; 86+ messages in thread
From: Vishal Annapurve @ 2025-07-21 13:45 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, seanjc, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Mon, Jul 21, 2025 at 5:22 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > +/*
> > + * CoCo VMs with hardware support that use guest_memfd only for backing private
> > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> > + */
> > +#define kvm_arch_supports_gmem_mmap(kvm)             \
> > +     (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&   \
> > +      (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
>
> I want to share the findings when I do the POC to enable gmem mmap in QEMU.
>
> Actually, QEMU can use gmem with mmap support as the normal memory even
> without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
> on KVM_SET_USER_MEMORY_REGION2.
>
> Since the gmem is mmapable, QEMU can pass the userspace addr got from
> mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
> works well for non-coco VMs on x86.
>
> Then it seems feasible to use gmem with mmap for the shared memory of
> TDX, and an additional gmem without mmap for the private memory. i.e.,
> For struct kvm_userspace_memory_region, the @userspace_addr is passed
> with the uaddr returned from gmem0 with mmap, while @guest_memfd is
> passed with another gmem1 fd without mmap.
>
> However, it fails actually, because the kvm_arch_suports_gmem_mmap()
> returns false for TDX VMs, which means userspace cannot allocate gmem
> with mmap just for shared memory for TDX.

Why do you want such a usecase to work?

If kvm allows mappable guest_memfd files for TDX VMs without
conversion support, userspace will be able to use those for backing
private memory unless:
1) KVM checks at binding time if the guest_memfd passed during memslot
creation is not a mappable one and doesn't enforce "not mappable"
requirement for TDX VMs at creation time.
2) KVM fetches shared faults through userspace page tables and not
guest_memfd directly.

I don't see value in trying to go out of way to support such a usecase.

>
> SO my question is do we want to support such case?


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 13:45     ` Vishal Annapurve
@ 2025-07-21 14:42       ` Xiaoyao Li
  2025-07-21 14:42       ` Sean Christopherson
  1 sibling, 0 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-21 14:42 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, seanjc, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 7/21/2025 9:45 PM, Vishal Annapurve wrote:
> On Mon, Jul 21, 2025 at 5:22 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>
>> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
>>> +/*
>>> + * CoCo VMs with hardware support that use guest_memfd only for backing private
>>> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
>>> + */
>>> +#define kvm_arch_supports_gmem_mmap(kvm)             \
>>> +     (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&   \
>>> +      (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
>>
>> I want to share the findings when I do the POC to enable gmem mmap in QEMU.
>>
>> Actually, QEMU can use gmem with mmap support as the normal memory even
>> without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
>> on KVM_SET_USER_MEMORY_REGION2.
>>
>> Since the gmem is mmapable, QEMU can pass the userspace addr got from
>> mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
>> works well for non-coco VMs on x86.
>>
>> Then it seems feasible to use gmem with mmap for the shared memory of
>> TDX, and an additional gmem without mmap for the private memory. i.e.,
>> For struct kvm_userspace_memory_region, the @userspace_addr is passed
>> with the uaddr returned from gmem0 with mmap, while @guest_memfd is
>> passed with another gmem1 fd without mmap.
>>
>> However, it fails actually, because the kvm_arch_suports_gmem_mmap()
>> returns false for TDX VMs, which means userspace cannot allocate gmem
>> with mmap just for shared memory for TDX.
> 
> Why do you want such a usecase to work?
> 
> If kvm allows mappable guest_memfd files for TDX VMs without
> conversion support, userspace will be able to use those for backing
> private memory unless:
> 1) KVM checks at binding time if the guest_memfd passed during memslot
> creation is not a mappable one and doesn't enforce "not mappable"
> requirement for TDX VMs at creation time.

yes, this is the additional change required.

> 2) KVM fetches shared faults through userspace page tables and not
> guest_memfd directly.

current KVM supports it already.

And as I described above,  current userspace can just mmap the gmem and 
pass the gotten addr to userspace_addr without passing guest_memfd, to 
force KVM to fetch through userspace page tables.

So if we want KVM to fetch page from guest memfd directly, should we add 
something in KVM to enforce it?

> I don't see value in trying to go out of way to support such a usecase.

 From my perspective, it's intuitive to think about this usecase when I 
tried to enable gmem mmap in QEMU. It seems a little strange that 
mmap'ed gmem can only serve as normal memory for non-CoCo VMs.

If KVM mandates any gmem must be passed via 
kvm_userspace_memory_region2.guest_memfd, no matter it's mmap'ed or not, 
it would make more sense (to me) to not support such usecase, since 
there is only one guest_memfd field in struct kvm_userspace_memory_region2.

At last, I just put it on the table, so people can be aware of it and 
make the call to support it or not. Either is OK to me.





^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 13:45     ` Vishal Annapurve
  2025-07-21 14:42       ` Xiaoyao Li
@ 2025-07-21 14:42       ` Sean Christopherson
  2025-07-21 15:07         ` Xiaoyao Li
  1 sibling, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 14:42 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: Xiaoyao Li, Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Mon, Jul 21, 2025, Vishal Annapurve wrote:
> On Mon, Jul 21, 2025 at 5:22 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> >
> > On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > > +/*
> > > + * CoCo VMs with hardware support that use guest_memfd only for backing private
> > > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> > > + */
> > > +#define kvm_arch_supports_gmem_mmap(kvm)             \
> > > +     (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&   \
> > > +      (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
> >
> > I want to share the findings when I do the POC to enable gmem mmap in QEMU.
> >
> > Actually, QEMU can use gmem with mmap support as the normal memory even
> > without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
> > on KVM_SET_USER_MEMORY_REGION2.
> >
> > Since the gmem is mmapable, QEMU can pass the userspace addr got from
> > mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
> > works well for non-coco VMs on x86.
> >
> > Then it seems feasible to use gmem with mmap for the shared memory of
> > TDX, and an additional gmem without mmap for the private memory. i.e.,
> > For struct kvm_userspace_memory_region, the @userspace_addr is passed
> > with the uaddr returned from gmem0 with mmap, while @guest_memfd is
> > passed with another gmem1 fd without mmap.
> >
> > However, it fails actually, because the kvm_arch_suports_gmem_mmap()
> > returns false for TDX VMs, which means userspace cannot allocate gmem
> > with mmap just for shared memory for TDX.
> 
> Why do you want such a usecase to work?

I'm guessing Xiaoyao was asking an honest question in response to finding a
perceived flaw when trying to get this all working in QEMU.

> If kvm allows mappable guest_memfd files for TDX VMs without
> conversion support, userspace will be able to use those for backing

s/able/unable?

> private memory unless:
> 1) KVM checks at binding time if the guest_memfd passed during memslot
> creation is not a mappable one and doesn't enforce "not mappable"
> requirement for TDX VMs at creation time.

Xiaoyao's question is about "just for shared memory", so this is irrelevant for
the question at hand.

> 2) KVM fetches shared faults through userspace page tables and not
> guest_memfd directly.

This is also irrelevant.  KVM _already_ supports resolving shared faults through
userspace page tables.  That support won't go away as KVM will always need/want
to support mapping VM_IO and/or VM_PFNMAP memory into the guest (even for TDX).

> I don't see value in trying to go out of way to support such a usecase.

But if/when KVM gains support for tracking shared vs. private in guest_memfd
itself, i.e. when TDX _does_ support mmap() on guest_memfd, KVM won't have to go
out of its to support using guest_memfd for the @userspace_addr backing store.
Unless I'm missing something, the only thing needed to "support" this scenario is:

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index d01bd7a2c2bd..34403d2f1eeb 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -533,7 +533,7 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
        u64 flags = args->flags;
        u64 valid_flags = 0;

-       if (kvm_arch_supports_gmem_mmap(kvm))
+       // if (kvm_arch_supports_gmem_mmap(kvm))
                valid_flags |= GUEST_MEMFD_FLAG_MMAP;

        if (flags & ~valid_flags)

I think the question we actually want to answer is: do we want to go out of our
way to *prevent* such a usecase.  E.g. is there any risk/danger that we need to
mitigate, and would the cost of the mitigation be acceptable?

I think the answer is "no", because preventing userspace from using guest_memfd
as shared-only memory would require resolving the VMA during hva_to_pfn() in order
to fully prevent such behavior, and I defintely don't want to take mmap_lock
around hva_to_pfn_fast().

I don't see any obvious danger lurking.  KVM's pre-guest_memfd memory management
scheme is all about effectively making KVM behave like "just another" userspace
agent.  E.g. if/when TDX/SNP support comes along, guest_memfd must not allow mapping
private memory into userspace regardless of what KVM supports for page faults.

So unless I'm missing something, for now we do nothing, and let this support come
along naturally once TDX support mmap() on guest_memfd.

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem()
  2025-07-18  1:42   ` Xiaoyao Li
@ 2025-07-21 14:47     ` Sean Christopherson
  2025-07-21 14:55     ` Fuad Tabba
  1 sibling, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 14:47 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Fri, Jul 18, 2025, Xiaoyao Li wrote:
> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > -/* SMM is currently unsupported for guests with private memory. */
> > +/* SMM is currently unsupported for guests with guest_memfd private memory. */
> >   # define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 2)
> 
> As I commented in the v14, please don't change the comment.

+1, keep it as is.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem()
  2025-07-18  1:42   ` Xiaoyao Li
  2025-07-21 14:47     ` Sean Christopherson
@ 2025-07-21 14:55     ` Fuad Tabba
  1 sibling, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-21 14:55 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Fri, 18 Jul 2025 at 02:43, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > -/* SMM is currently unsupported for guests with private memory. */
> > +/* SMM is currently unsupported for guests with guest_memfd private memory. */
> >   # define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 2)
>
> As I commented in the v14, please don't change the comment.
>
> It is checking kvm_arch_has_private_mem(), *not*
> kvm_arch_supports_gmem(). So why bother mentioning guest_memfd here?

Ack.

Thanks,
/fuad

>


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 14:42       ` Sean Christopherson
@ 2025-07-21 15:07         ` Xiaoyao Li
  2025-07-21 17:29           ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-21 15:07 UTC (permalink / raw)
  To: Sean Christopherson, Vishal Annapurve
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 7/21/2025 10:42 PM, Sean Christopherson wrote:
> On Mon, Jul 21, 2025, Vishal Annapurve wrote:
>> On Mon, Jul 21, 2025 at 5:22 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>>
>>> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
>>>> +/*
>>>> + * CoCo VMs with hardware support that use guest_memfd only for backing private
>>>> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
>>>> + */
>>>> +#define kvm_arch_supports_gmem_mmap(kvm)             \
>>>> +     (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&   \
>>>> +      (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
>>>
>>> I want to share the findings when I do the POC to enable gmem mmap in QEMU.
>>>
>>> Actually, QEMU can use gmem with mmap support as the normal memory even
>>> without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
>>> on KVM_SET_USER_MEMORY_REGION2.
>>>
>>> Since the gmem is mmapable, QEMU can pass the userspace addr got from
>>> mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
>>> works well for non-coco VMs on x86.
>>>
>>> Then it seems feasible to use gmem with mmap for the shared memory of
>>> TDX, and an additional gmem without mmap for the private memory. i.e.,
>>> For struct kvm_userspace_memory_region, the @userspace_addr is passed
>>> with the uaddr returned from gmem0 with mmap, while @guest_memfd is
>>> passed with another gmem1 fd without mmap.
>>>
>>> However, it fails actually, because the kvm_arch_suports_gmem_mmap()
>>> returns false for TDX VMs, which means userspace cannot allocate gmem
>>> with mmap just for shared memory for TDX.
>>
>> Why do you want such a usecase to work?
> 
> I'm guessing Xiaoyao was asking an honest question in response to finding a
> perceived flaw when trying to get this all working in QEMU.

I'm not sure if it is an flaw. Such usecase is not supported is just 
anti-intuition to me.

>> If kvm allows mappable guest_memfd files for TDX VMs without
>> conversion support, userspace will be able to use those for backing
> 
> s/able/unable?

I think vishal meant "able", because ...

>> private memory unless:
>> 1) KVM checks at binding time if the guest_memfd passed during memslot
>> creation is not a mappable one and doesn't enforce "not mappable"
>> requirement for TDX VMs at creation time.
> 
> Xiaoyao's question is about "just for shared memory", so this is irrelevant for
> the question at hand.

... if we allow gmem mmap for TDX, KVM needs to ensure the mmapable gmem 
should only be passed via userspace_addr. IOW, KVM needs to forbid 
userspace from passing the mmap'able guest_memfd to 
kvm_userspace_memory_region2.guest_memfd. Because it allows userspace to 
access the private mmeory.

>> 2) KVM fetches shared faults through userspace page tables and not
>> guest_memfd directly.
> 
> This is also irrelevant.  KVM _already_ supports resolving shared faults through
> userspace page tables.  That support won't go away as KVM will always need/want
> to support mapping VM_IO and/or VM_PFNMAP memory into the guest (even for TDX).
> 
>> I don't see value in trying to go out of way to support such a usecase.
> 
> But if/when KVM gains support for tracking shared vs. private in guest_memfd
> itself, i.e. when TDX _does_ support mmap() on guest_memfd, KVM won't have to go
> out of its to support using guest_memfd for the @userspace_addr backing store.
> Unless I'm missing something, the only thing needed to "support" this scenario is:

As above, we need 1) mentioned by Vishal as well, to prevent userspace 
from passing mmapable guest_memfd to serve as private memory.

> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index d01bd7a2c2bd..34403d2f1eeb 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -533,7 +533,7 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>          u64 flags = args->flags;
>          u64 valid_flags = 0;
>   
> -       if (kvm_arch_supports_gmem_mmap(kvm))
> +       // if (kvm_arch_supports_gmem_mmap(kvm))
>                  valid_flags |= GUEST_MEMFD_FLAG_MMAP;
>   
>          if (flags & ~valid_flags)
> 
> I think the question we actually want to answer is: do we want to go out of our
> way to *prevent* such a usecase.  E.g. is there any risk/danger that we need to
> mitigate, and would the cost of the mitigation be acceptable?
> 
> I think the answer is "no", because preventing userspace from using guest_memfd
> as shared-only memory would require resolving the VMA during hva_to_pfn() in order
> to fully prevent such behavior, and I defintely don't want to take mmap_lock
> around hva_to_pfn_fast().
> 
> I don't see any obvious danger lurking.  KVM's pre-guest_memfd memory management
> scheme is all about effectively making KVM behave like "just another" userspace
> agent.  E.g. if/when TDX/SNP support comes along, guest_memfd must not allow mapping
> private memory into userspace regardless of what KVM supports for page faults.
> 
> So unless I'm missing something, for now we do nothing, and let this support come
> along naturally once TDX support mmap() on guest_memfd.



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 01/21] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  2025-07-17 16:27 ` [PATCH v15 01/21] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
@ 2025-07-21 15:17   ` Sean Christopherson
  2025-07-21 15:26     ` Fuad Tabba
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 15:17 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jul 17, 2025, Fuad Tabba wrote:
> Rename the Kconfig option CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM. 

Please name this CONFIG_KVM_GUEST_MEMFD.  I'm a-ok using gmem as the namespace
for functions/macros/variables, but there's zero reason to shorten things like
Kconfigs.

> @@ -719,10 +719,10 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
>  #endif
>  
>  /*
> - * Arch code must define kvm_arch_has_private_mem if support for private memory
> - * is enabled.
> + * Arch code must define kvm_arch_has_private_mem if support for guest_memfd is
> + * enabled.

This is undesirable, and the comment is flat out wrong.  As evidenced by the lack
of a #define in arm64, arch does NOT need to #define kvm_arch_has_private_mem if
CONFIG_KVM_GUEST_MEMFD=y.  It "works" because the sole caller to kvm_arch_has_private_mem()
is guarded by CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES=y, and that's never selected
by arm64.

I.e. this needs to key off of CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES=y, not off of
CONFIG_KVM_GUEST_MEMFD=y.  And I would just drop the comment altogether at that
point, because it's all quite self-explanatory:

#ifndef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
{
	return false;
}
#endif

>   */
> -#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
> +#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
>  static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
>  {
>  	return false;
> @@ -2527,7 +2527,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>  
>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
> -	return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
> +	return IS_ENABLED(CONFIG_KVM_GMEM) &&

And this is equally wrong.  The existing code checked CONFIG_KVM_PRIVATE_MEM,
because memory obviously can't be private if private memory is unsupported.

But that logic chain doesn't work as well for guest_memfd.  In a way, this is a
weird semantic change, e.g. it changes from "select guest_memfd if private memory
is supported" to "allow private memory if guest_memfd is select".   The former
existed because compiling in support for guest_memfd when it coulnd't possibly
be used was wasteful, but even then it was somewhat superfluous.

The latter is an arbitrary requirement that probably shouldn't exist, and if we
did want to make it a hard requirement, should be expressed in the Kconfig
dependency, not here.

TL;DR: drop the IS_ENABLED(CONFIG_KVM_GMEM) check.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 01/21] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  2025-07-21 15:17   ` Sean Christopherson
@ 2025-07-21 15:26     ` Fuad Tabba
  0 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-21 15:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi Sean,

On Mon, 21 Jul 2025 at 16:17, Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Jul 17, 2025, Fuad Tabba wrote:
> > Rename the Kconfig option CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM.
>
> Please name this CONFIG_KVM_GUEST_MEMFD.  I'm a-ok using gmem as the namespace
> for functions/macros/variables, but there's zero reason to shorten things like
> Kconfigs.

Ack.

> > @@ -719,10 +719,10 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
> >  #endif
> >
> >  /*
> > - * Arch code must define kvm_arch_has_private_mem if support for private memory
> > - * is enabled.
> > + * Arch code must define kvm_arch_has_private_mem if support for guest_memfd is
> > + * enabled.
>
> This is undesirable, and the comment is flat out wrong.  As evidenced by the lack
> of a #define in arm64, arch does NOT need to #define kvm_arch_has_private_mem if
> CONFIG_KVM_GUEST_MEMFD=y.  It "works" because the sole caller to kvm_arch_has_private_mem()
> is guarded by CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES=y, and that's never selected
> by arm64.
>
> I.e. this needs to key off of CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES=y, not off of
> CONFIG_KVM_GUEST_MEMFD=y.  And I would just drop the comment altogether at that
> point, because it's all quite self-explanatory:
>
> #ifndef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
> {
>         return false;
> }
> #endif

Ack.

>
> >   */
> > -#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
> > +#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> >  static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
> >  {
> >       return false;
> > @@ -2527,7 +2527,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> >
> >  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >  {
> > -     return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
> > +     return IS_ENABLED(CONFIG_KVM_GMEM) &&
>
> And this is equally wrong.  The existing code checked CONFIG_KVM_PRIVATE_MEM,
> because memory obviously can't be private if private memory is unsupported.
>
> But that logic chain doesn't work as well for guest_memfd.  In a way, this is a
> weird semantic change, e.g. it changes from "select guest_memfd if private memory
> is supported" to "allow private memory if guest_memfd is select".   The former
> existed because compiling in support for guest_memfd when it coulnd't possibly
> be used was wasteful, but even then it was somewhat superfluous.
>
> The latter is an arbitrary requirement that probably shouldn't exist, and if we
> did want to make it a hard requirement, should be expressed in the Kconfig
> dependency, not here.
>
> TL;DR: drop the IS_ENABLED(CONFIG_KVM_GMEM) check.

Ack.

Thanks!
/fuad


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-07-17 16:27 ` [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
@ 2025-07-21 16:44   ` Sean Christopherson
  2025-07-21 16:51     ` Fuad Tabba
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 16:44 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jul 17, 2025, Fuad Tabba wrote:
> The original name was vague regarding its functionality. 

It was intentionally vague/broad so that KVM didn't end up with an explosion of
Kconfigs.

> This Kconfig option specifically enables and gates the kvm_gmem_populate()
> function, which is responsible for populating a GPA range with guest data.

And obviously selects KVM_GENERIC_MEMORY_ATTRIBUTES...

> The new name, KVM_GENERIC_GMEM_POPULATE, describes the purpose of the
> option: to enable generic guest_memfd population mechanisms. 

As above, the purpose of KVM_GENERIC_PRIVATE_MEM isn't just to enable
kvm_gmem_populate().  In fact, the Kconfig predates kvm_gmem_populate().  The
main reason KVM_GENERIC_PRIVATE_MEM was added was to avoid having to select the
same set of Kconfigs in every flavor of CoCo-ish VM, i.e. was to avoid what this
patch does.

There was a bit of mis-speculation in that x86 ended up being the only arch that
wants KVM_GENERIC_MEMORY_ATTRIBUTES, so we should simply remedy that.  Providing
KVM_PRIVATE_MEM in x86 would also clean up this mess:

	select KVM_GMEM if KVM_SW_PROTECTED_VM
	select KVM_GENERIC_MEMORY_ATTRIBUTES if KVM_SW_PROTECTED_VM
	select KVM_GMEM_SUPPORTS_MMAP if X86_64

Where KVM_GMEM_SUPPORTS_MMAP and thus KVM_GUEST_MEMFD is selected by X86_64.
I.e. X86_64 is subtly *unconditionally* enabling guest_memfd.  I have no objection
to always supporting guest_memfd for 64-bit, but it should be obvious, not buried
in a Kconfig config.

More importantly, the above means it's impossible to have KVM_GMEM without
KVM_GMEM_SUPPORTS_MMAP, because arm64 always selects KVM_GMEM_SUPPORTS_MMAP, and
x86 can only select KVM_GMEM when KVM_GMEM_SUPPORTS_MMAP is forced/selected.

Following that trail of breadcrumbs, x86 ends up with another tautology that isn't
captured.  kvm_arch_supports_gmem() is true for literally every type of VM.  It
isn't true for every #defined VM type, since it's not allowed for KVM_X86_SEV_VM
or KVM_X86_SEV_ES_VM.  But those are recent additions that are entirely optional.
I.e. userspace can create SEV and/or SEV-ES VMs using KVM_X86_DEFAULT_VM.

And if we fix that oddity, and follow more breadcrumbs, we arrive at
kvm_arch_supports_gmem_mmap(), where it unnecessarily open codes a check on
KVM_X86_DEFAULT_VM when in fact the real restriction is that guest_memfd mmap()
is currently incompatible with kvm_arch_has_private_mem().

I already have a NAK typed up for patch 3 for completely unrelated reasons (adding
arch.supports_gmem creates unnecessary potential for bugs, e.g. allows checking
kvm_arch_supports_gmem() before the flag is set).  That's all the more reason to
kill off as many of these #defines and checks as possible.

Oh, and that also ties into Xiaoyao's question about what to do with mapping
guest_memfd into a memslot without a guest_memfd file descriptor.  Once we add
private vs. shared tracking in guest_memfd, kvm_arch_supports_gmem_mmap() becomes
true if CONFIG_KVM_GUEST_MEMFD=y.

Heh, so going through all of that, KVM_PRIVATE_MEM just ends up being this:

config KVM_PRIVATE_MEM
	depends on X86_64
	select KVM_GENERIC_MEMORY_ATTRIBUTES
	bool

which means my initial feedback that prompted this becomes null and void :-)

That said, I think we should take this opportunity to select KVM_GENERIC_MEMORY_ATTRIBUTES
directly instead of having it selected from "config KVM".  There's a similar
oddity with TDX.

> improves clarity for developers and ensures the name accurately reflects
> the functionality it controls, especially as guest_memfd support expands
> beyond purely "private" memory scenarios.
> 
> Note that the vm type KVM_X86_SW_PROTECTED_VM does not need the populate
> function. Therefore, ensure that the correct configuration is selected
> when KVM_SW_PROTECTED_VM is enabled.
> 
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/x86/kvm/Kconfig     | 7 ++++---
>  include/linux/kvm_host.h | 2 +-
>  virt/kvm/Kconfig         | 2 +-
>  virt/kvm/guest_memfd.c   | 2 +-
>  4 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 2eeffcec5382..12e723bb76cc 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -46,7 +46,8 @@ config KVM_X86
>  	select HAVE_KVM_PM_NOTIFIER if PM
>  	select KVM_GENERIC_HARDWARE_ENABLING
>  	select KVM_GENERIC_PRE_FAULT_MEMORY
> -	select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
> +	select KVM_GMEM if KVM_SW_PROTECTED_VM
> +	select KVM_GENERIC_MEMORY_ATTRIBUTES if KVM_SW_PROTECTED_VM
>  	select KVM_WERROR if WERROR
>  
>  config KVM
> @@ -95,7 +96,7 @@ config KVM_SW_PROTECTED_VM
>  config KVM_INTEL
>  	tristate "KVM for Intel (and compatible) processors support"
>  	depends on KVM && IA32_FEAT_CTL
> -	select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
> +	select KVM_GENERIC_GMEM_POPULATE if INTEL_TDX_HOST
>  	select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
>  	help
>  	  Provides support for KVM on processors equipped with Intel's VT
> @@ -157,7 +158,7 @@ config KVM_AMD_SEV
>  	depends on KVM_AMD && X86_64
>  	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
>  	select ARCH_HAS_CC_PLATFORM
> -	select KVM_GENERIC_PRIVATE_MEM
> +	select KVM_GENERIC_GMEM_POPULATE
>  	select HAVE_KVM_ARCH_GMEM_PREPARE
>  	select HAVE_KVM_ARCH_GMEM_INVALIDATE
>  	help
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 755b09dcafce..359baaae5e9f 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2556,7 +2556,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>  int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
>  #endif
>  
> -#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
> +#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
>  /**
>   * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
>   *
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 49df4e32bff7..559c93ad90be 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -116,7 +116,7 @@ config KVM_GMEM
>         select XARRAY_MULTI
>         bool
>  
> -config KVM_GENERIC_PRIVATE_MEM
> +config KVM_GENERIC_GMEM_POPULATE
>         select KVM_GENERIC_MEMORY_ATTRIBUTES
>         select KVM_GMEM

This is where things really start to break down.  Selecting KVM_GUEST_MEMFD and
KVM_GENERIC_MEMORY_ATTRIBUTES when KVM_GENERIC_PRIVATE_MEM=y is decent logic.
*Selecting* KVM_GUEST_MEMFD from a sub-feature of guest_memfd is weird.

I don't love HAVE_KVM_ARCH_GMEM_INVALIDATE and HAVE_KVM_ARCH_GMEM_PREPARE, as I
think they're too fine-grained.  But that's largely an orthogonal problem, and
it's not clear that bundling them together would be an improvement.  So, I think
we should just follow those and add HAVE_KVM_ARCH_GMEM_POPULATE, selected by SEV
and TDX.

The below diff applies on top.  I'm guessing there may be some intermediate
ugliness (I haven't mapped out exactly where/how to squash this throughout the
series, and there is feedback relevant to future patches), but IMO this is a much
cleaner resting state (see the diff stats).

---
 arch/arm64/include/asm/kvm_host.h |  5 -----
 arch/arm64/kvm/Kconfig            |  3 +--
 arch/x86/include/asm/kvm_host.h   | 15 +-------------
 arch/x86/kvm/Kconfig              | 10 +++++----
 arch/x86/kvm/x86.c                | 13 ++++++++++--
 include/linux/kvm_host.h          | 34 +++++--------------------------
 virt/kvm/Kconfig                  | 11 +++-------
 virt/kvm/guest_memfd.c            | 10 +++++----
 virt/kvm/kvm_main.c               |  8 +++-----
 9 files changed, 36 insertions(+), 73 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 63f7827cfa1b..3408174ec945 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1674,9 +1674,4 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt);
 void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1);
 void check_feature_map(void);
 
-#ifdef CONFIG_KVM_GMEM
-#define kvm_arch_supports_gmem(kvm) true
-#define kvm_arch_supports_gmem_mmap(kvm) IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP)
-#endif
-
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 323b46b7c82f..bff62e75d681 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -37,8 +37,7 @@ menuconfig KVM
 	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
-	select KVM_GMEM
-	select KVM_GMEM_SUPPORTS_MMAP
+	select KVM_GUEST_MEMFD
 	help
 	  Support hosting virtualized guest machines.
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e1426adfa93e..d93560769465 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2276,21 +2276,8 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 		       int tdp_max_root_level, int tdp_huge_page_level);
 
 
-#ifdef CONFIG_KVM_GMEM
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
-#define kvm_arch_supports_gmem(kvm)  ((kvm)->arch.supports_gmem)
-
-/*
- * CoCo VMs with hardware support that use guest_memfd only for backing private
- * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
- */
-#define kvm_arch_supports_gmem_mmap(kvm)		\
-	(IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&	\
-	 (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
-#else
-#define kvm_arch_has_private_mem(kvm) false
-#define kvm_arch_supports_gmem(kvm) false
-#define kvm_arch_supports_gmem_mmap(kvm) false
 #endif
 
 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 2eeffcec5382..afcf8628f615 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -46,8 +46,8 @@ config KVM_X86
 	select HAVE_KVM_PM_NOTIFIER if PM
 	select KVM_GENERIC_HARDWARE_ENABLING
 	select KVM_GENERIC_PRE_FAULT_MEMORY
-	select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
 	select KVM_WERROR if WERROR
+	select KVM_GUEST_MEMFD if X86_64
 
 config KVM
 	tristate "Kernel-based Virtual Machine (KVM) support"
@@ -84,6 +84,7 @@ config KVM_SW_PROTECTED_VM
 	bool "Enable support for KVM software-protected VMs"
 	depends on EXPERT
 	depends on KVM && X86_64
+	select KVM_GENERIC_MEMORY_ATTRIBUTES
 	help
 	  Enable support for KVM software-protected VMs.  Currently, software-
 	  protected VMs are purely a development and testing vehicle for
@@ -95,8 +96,6 @@ config KVM_SW_PROTECTED_VM
 config KVM_INTEL
 	tristate "KVM for Intel (and compatible) processors support"
 	depends on KVM && IA32_FEAT_CTL
-	select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
-	select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
 	help
 	  Provides support for KVM on processors equipped with Intel's VT
 	  extensions, a.k.a. Virtual Machine Extensions (VMX).
@@ -135,6 +134,8 @@ config KVM_INTEL_TDX
 	bool "Intel Trust Domain Extensions (TDX) support"
 	default y
 	depends on INTEL_TDX_HOST
+	select KVM_GENERIC_MEMORY_ATTRIBUTES
+	select HAVE_KVM_ARCH_GMEM_POPULATE
 	help
 	  Provides support for launching Intel Trust Domain Extensions (TDX)
 	  confidential VMs on Intel processors.
@@ -157,9 +158,10 @@ config KVM_AMD_SEV
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select ARCH_HAS_CC_PLATFORM
-	select KVM_GENERIC_PRIVATE_MEM
+	select KVM_GENERIC_MEMORY_ATTRIBUTES
 	select HAVE_KVM_ARCH_GMEM_PREPARE
 	select HAVE_KVM_ARCH_GMEM_INVALIDATE
+	select HAVE_KVM_ARCH_GMEM_POPULATE
 	help
 	  Provides support for launching encrypted VMs which use Secure
 	  Encrypted Virtualization (SEV), Secure Encrypted Virtualization with
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca99187a566e..b6961b4b7aee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12781,8 +12781,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	kvm->arch.vm_type = type;
 	kvm->arch.has_private_mem = (type == KVM_X86_SW_PROTECTED_VM);
-	kvm->arch.supports_gmem =
-		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
 	/* Decided by the vendor code for other VM types.  */
 	kvm->arch.pre_fault_allowed =
 		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
@@ -13708,6 +13706,16 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
 
+#ifdef CONFIG_KVM_GUEST_MEMFD
+/*
+ * KVM doesn't yet support mmap() on guest_memfd for VMs with private memory
+ * (the private vs. shared tracking needs to be moved into guest_memfd).
+ */
+bool kvm_arch_supports_gmem_mmap(struct kvm *kvm)
+{
+	return !kvm_arch_has_private_mem(kvm);
+}
+
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
 {
@@ -13721,6 +13729,7 @@ void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
 	kvm_x86_call(gmem_invalidate)(start, end);
 }
 #endif
+#endif
 
 int kvm_spec_ctrl_test_value(u64 value)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2c1dcd3967d9..a9f31b2b63b1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -719,39 +719,15 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
 }
 #endif
 
-/*
- * Arch code must define kvm_arch_has_private_mem if support for guest_memfd is
- * enabled.
- */
-#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
+#ifndef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
 {
 	return false;
 }
 #endif
 
-/*
- * Arch code must define kvm_arch_supports_gmem if support for guest_memfd is
- * enabled.
- */
-#if !defined(kvm_arch_supports_gmem) && !IS_ENABLED(CONFIG_KVM_GMEM)
-static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
-{
-	return false;
-}
-#endif
-
-/*
- * Returns true if this VM supports mmap() in guest_memfd.
- *
- * Arch code must define kvm_arch_supports_gmem_mmap if support for guest_memfd
- * is enabled.
- */
-#if !defined(kvm_arch_supports_gmem_mmap)
-static inline bool kvm_arch_supports_gmem_mmap(struct kvm *kvm)
-{
-	return false;
-}
+#ifdef CONFIG_KVM_GUEST_MEMFD
+bool kvm_arch_supports_gmem_mmap(struct kvm *kvm);
 #endif
 
 #ifndef kvm_arch_has_readonly_mem
@@ -2539,7 +2515,7 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
 
 static inline bool kvm_memslot_is_gmem_only(const struct kvm_memory_slot *slot)
 {
-	if (!IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP))
+	if (!IS_ENABLED(CONFIG_KVM_GUEST_MEMFD))
 		return false;
 
 	return slot->flags & KVM_MEMSLOT_GMEM_ONLY;
@@ -2596,7 +2572,7 @@ static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
 #endif
 
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE
 /**
  * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
  *
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 96cf4ab0d534..9d472f46ebf1 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -112,15 +112,10 @@ config KVM_GENERIC_MEMORY_ATTRIBUTES
        depends on KVM_GENERIC_MMU_NOTIFIER
        bool
 
-config KVM_GMEM
+config KVM_GUEST_MEMFD
        select XARRAY_MULTI
        bool
 
-config KVM_GENERIC_PRIVATE_MEM
-       select KVM_GENERIC_MEMORY_ATTRIBUTES
-       select KVM_GMEM
-       bool
-
 config HAVE_KVM_ARCH_GMEM_PREPARE
        bool
        depends on KVM_GMEM
@@ -129,6 +124,6 @@ config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
        depends on KVM_GMEM
 
-config KVM_GMEM_SUPPORTS_MMAP
-       select KVM_GMEM
+config HAVE_KVM_ARCH_GMEM_POPULATE
        bool
+       depends on KVM_GMEM
\ No newline at end of file
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index d01bd7a2c2bd..57db0041047a 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -316,9 +316,6 @@ static bool kvm_gmem_supports_mmap(struct inode *inode)
 {
 	const u64 flags = (u64)inode->i_private;
 
-	if (!IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP))
-		return false;
-
 	return flags & GUEST_MEMFD_FLAG_MMAP;
 }
 
@@ -527,6 +524,11 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	return err;
 }
 
+bool __weak kvm_arch_supports_gmem_mmap(struct kvm *kvm)
+{
+	return true;
+}
+
 int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
 {
 	loff_t size = args->size;
@@ -730,7 +732,7 @@ int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_mapping_order);
 
-#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE
 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f1ac872e01e9..1b609e35303f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1588,7 +1588,7 @@ static int check_memory_region_flags(struct kvm *kvm,
 {
 	u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
 
-	if (kvm_arch_supports_gmem(kvm))
+	if (IS_ENABLED(CONFIG_KVM_GUEST_MEMFD))
 		valid_flags |= KVM_MEM_GUEST_MEMFD;
 
 	/* Dirty logging private memory is not currently supported. */
@@ -4915,10 +4915,8 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #endif
 #ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
-		return !kvm || kvm_arch_supports_gmem(kvm);
-#endif
-#ifdef CONFIG_KVM_GMEM_SUPPORTS_MMAP
-	case KVM_CAP_GMEM_MMAP:
+		return 1;
+	case KVM_CAP_GUEST_MEMFD_MMAP:
 		return !kvm || kvm_arch_supports_gmem_mmap(kvm);
 #endif
 	default:

base-commit: 9eba3a9ac9cd5922da7f6e966c01190f909ed640
--


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem()
  2025-07-17 16:27 ` [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem() Fuad Tabba
  2025-07-18  1:42   ` Xiaoyao Li
@ 2025-07-21 16:44   ` Sean Christopherson
  1 sibling, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 16:44 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jul 17, 2025, Fuad Tabba wrote:
> Introduce kvm_arch_supports_gmem() to explicitly indicate whether an
> architecture supports guest_memfd.
> 
> Previously, kvm_arch_has_private_mem() was used to check for guest_memfd
> support. However, this conflated guest_memfd with "private" memory,
> implying that guest_memfd was exclusively for CoCo VMs or other private
> memory use cases.
> 
> With the expansion of guest_memfd to support non-private memory, such as
> shared host mappings, it is necessary to decouple these concepts. The
> new kvm_arch_supports_gmem() function provides a clear way to check for
> guest_memfd support.
> 
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  4 +++-
>  include/linux/kvm_host.h        | 11 +++++++++++
>  virt/kvm/kvm_main.c             |  4 ++--
>  3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index acb25f935d84..bde811b2d303 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2277,8 +2277,10 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>  
>  #ifdef CONFIG_KVM_GMEM
>  #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
> +#define kvm_arch_supports_gmem(kvm) kvm_arch_has_private_mem(kvm)

Don't support/use macros, just make kvm_arch_supports_gmem() a "normal" arch hook.
kvm_arch_has_private_mem() is a macro so that kvm_arch_nr_memslot_as_ids() can
resolve to a compile-time constant when CONFIG_KVM_GMEM is false, and so that it's
a trivial check when true.

kvm_arch_supports_gmem() is only ever used in slow paths, check_memory_region_flags()
and kvm_vm_ioctl_check_extension_generic().  Of course, after my suggestions from
patch 2, it goes away completely.

>  #else
>  #define kvm_arch_has_private_mem(kvm) false
> +#define kvm_arch_supports_gmem(kvm) false

This is silly.  It adds code to x86 *and* makes the generic code more complex.
Again, a moot point in the end, but for future reference this isn't a pattern we
should encourage.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 04/21] KVM: x86: Introduce kvm->arch.supports_gmem
  2025-07-17 16:27 ` [PATCH v15 04/21] KVM: x86: Introduce kvm->arch.supports_gmem Fuad Tabba
@ 2025-07-21 16:45   ` Sean Christopherson
  2025-07-21 17:00     ` Fuad Tabba
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 16:45 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jul 17, 2025, Fuad Tabba wrote:
> Introduce a new boolean member, supports_gmem, to kvm->arch.
> 
> Previously, the has_private_mem boolean within kvm->arch was implicitly
> used to indicate whether guest_memfd was supported for a KVM instance.
> However, with the broader support for guest_memfd, it's not exclusively
> for private or confidential memory. Therefore, it's necessary to
> distinguish between a VM's general guest_memfd capabilities and its
> support for private memory.
> 
> This new supports_gmem member will now explicitly indicate guest_memfd
> support for a given VM, allowing has_private_mem to represent only
> support for private memory.
> 
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

NAK, this introduces unnecessary potential for bugs, e.g. KVM will get a false
negative if kvm_arch_supports_gmem() is invoked before kvm_x86_ops.vm_init().

Patch 2 makes this a moot point because kvm_arch_supports_gmem() can simply go away.

> ---
>  arch/x86/include/asm/kvm_host.h | 3 ++-
>  arch/x86/kvm/svm/svm.c          | 1 +
>  arch/x86/kvm/vmx/tdx.c          | 1 +
>  arch/x86/kvm/x86.c              | 4 ++--
>  4 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index bde811b2d303..938b5be03d33 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1348,6 +1348,7 @@ struct kvm_arch {
>  	u8 mmu_valid_gen;
>  	u8 vm_type;
>  	bool has_private_mem;
> +	bool supports_gmem;
>  	bool has_protected_state;
>  	bool pre_fault_allowed;
>  	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
> @@ -2277,7 +2278,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>  
>  #ifdef CONFIG_KVM_GMEM
>  #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
> -#define kvm_arch_supports_gmem(kvm) kvm_arch_has_private_mem(kvm)
> +#define kvm_arch_supports_gmem(kvm)  ((kvm)->arch.supports_gmem)
>  #else
>  #define kvm_arch_has_private_mem(kvm) false
>  #define kvm_arch_supports_gmem(kvm) false
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ab9b947dbf4f..d1c484eaa8ad 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5181,6 +5181,7 @@ static int svm_vm_init(struct kvm *kvm)
>  		to_kvm_sev_info(kvm)->need_init = true;
>  
>  		kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
> +		kvm->arch.supports_gmem = (type == KVM_X86_SNP_VM);
>  		kvm->arch.pre_fault_allowed = !kvm->arch.has_private_mem;
>  	}
>  
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index f31ccdeb905b..a3db6df245ee 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -632,6 +632,7 @@ int tdx_vm_init(struct kvm *kvm)
>  
>  	kvm->arch.has_protected_state = true;
>  	kvm->arch.has_private_mem = true;
> +	kvm->arch.supports_gmem = true;
>  	kvm->arch.disabled_quirks |= KVM_X86_QUIRK_IGNORE_GUEST_PAT;
>  
>  	/*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 357b9e3a6cef..adbdc2cc97d4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12780,8 +12780,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  		return -EINVAL;
>  
>  	kvm->arch.vm_type = type;
> -	kvm->arch.has_private_mem =
> -		(type == KVM_X86_SW_PROTECTED_VM);
> +	kvm->arch.has_private_mem = (type == KVM_X86_SW_PROTECTED_VM);
> +	kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
>  	/* Decided by the vendor code for other VM types.  */
>  	kvm->arch.pre_fault_allowed =
>  		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
> -- 
> 2.50.0.727.gbf7dc18ff4-goog
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-07-17 16:27 ` [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
  2025-07-18  6:09   ` Xiaoyao Li
@ 2025-07-21 16:47   ` Sean Christopherson
  2025-07-21 16:56     ` Fuad Tabba
  2025-07-22  5:41     ` Xiaoyao Li
  1 sibling, 2 replies; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 16:47 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jul 17, 2025, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> Update the KVM MMU fault handler to service guest page faults
> for memory slots backed by guest_memfd with mmap support. For such
> slots, the MMU must always fault in pages directly from guest_memfd,
> bypassing the host's userspace_addr.
> 
> This ensures that guest_memfd-backed memory is always handled through
> the guest_memfd specific faulting path, regardless of whether it's for
> private or non-private (shared) use cases.
> 
> Additionally, rename kvm_mmu_faultin_pfn_private() to
> kvm_mmu_faultin_pfn_gmem(), as this function is now used to fault in
> pages from guest_memfd for both private and non-private memory,
> accommodating the new use cases.
> 
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 94be15cde6da..ad5f337b496c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4511,8 +4511,8 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
>  				 r == RET_PF_RETRY, fault->map_writable);
>  }
>  
> -static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> -				       struct kvm_page_fault *fault)
> +static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> +				    struct kvm_page_fault *fault)
>  {
>  	int max_order, r;
>  
> @@ -4536,13 +4536,18 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
>  	return RET_PF_CONTINUE;
>  }
>  
> +static bool fault_from_gmem(struct kvm_page_fault *fault)

Drop the helper.  It has exactly one caller, and it makes the code *harder* to
read, e.g. raises the question of what "from gmem" even means.  If a separate
series follows and needs/justifies this helper, then it can/should be added then.

> +{
> +	return fault->is_private || kvm_memslot_is_gmem_only(fault->slot);
> +}
> +
>  static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>  				 struct kvm_page_fault *fault)
>  {
>  	unsigned int foll = fault->write ? FOLL_WRITE : 0;
>  
> -	if (fault->is_private)
> -		return kvm_mmu_faultin_pfn_private(vcpu, fault);
> +	if (fault_from_gmem(fault))
> +		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>  
>  	foll |= FOLL_NOWAIT;
>  	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
> -- 
> 2.50.0.727.gbf7dc18ff4-goog
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-07-21 16:44   ` Sean Christopherson
@ 2025-07-21 16:51     ` Fuad Tabba
  2025-07-21 17:33       ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-21 16:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi Sean,

On Mon, 21 Jul 2025 at 17:44, Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Jul 17, 2025, Fuad Tabba wrote:
> > The original name was vague regarding its functionality.
>
> It was intentionally vague/broad so that KVM didn't end up with an explosion of
> Kconfigs.
>
> > This Kconfig option specifically enables and gates the kvm_gmem_populate()
> > function, which is responsible for populating a GPA range with guest data.
>
> And obviously selects KVM_GENERIC_MEMORY_ATTRIBUTES...
>
> > The new name, KVM_GENERIC_GMEM_POPULATE, describes the purpose of the
> > option: to enable generic guest_memfd population mechanisms.
>
> As above, the purpose of KVM_GENERIC_PRIVATE_MEM isn't just to enable
> kvm_gmem_populate().  In fact, the Kconfig predates kvm_gmem_populate().  The
> main reason KVM_GENERIC_PRIVATE_MEM was added was to avoid having to select the
> same set of Kconfigs in every flavor of CoCo-ish VM, i.e. was to avoid what this
> patch does.
>
> There was a bit of mis-speculation in that x86 ended up being the only arch that
> wants KVM_GENERIC_MEMORY_ATTRIBUTES, so we should simply remedy that.  Providing
> KVM_PRIVATE_MEM in x86 would also clean up this mess:
>
>         select KVM_GMEM if KVM_SW_PROTECTED_VM
>         select KVM_GENERIC_MEMORY_ATTRIBUTES if KVM_SW_PROTECTED_VM
>         select KVM_GMEM_SUPPORTS_MMAP if X86_64
>
> Where KVM_GMEM_SUPPORTS_MMAP and thus KVM_GUEST_MEMFD is selected by X86_64.
> I.e. X86_64 is subtly *unconditionally* enabling guest_memfd.  I have no objection
> to always supporting guest_memfd for 64-bit, but it should be obvious, not buried
> in a Kconfig config.
>
> More importantly, the above means it's impossible to have KVM_GMEM without
> KVM_GMEM_SUPPORTS_MMAP, because arm64 always selects KVM_GMEM_SUPPORTS_MMAP, and
> x86 can only select KVM_GMEM when KVM_GMEM_SUPPORTS_MMAP is forced/selected.
>
> Following that trail of breadcrumbs, x86 ends up with another tautology that isn't
> captured.  kvm_arch_supports_gmem() is true for literally every type of VM.  It
> isn't true for every #defined VM type, since it's not allowed for KVM_X86_SEV_VM
> or KVM_X86_SEV_ES_VM.  But those are recent additions that are entirely optional.
> I.e. userspace can create SEV and/or SEV-ES VMs using KVM_X86_DEFAULT_VM.
>
> And if we fix that oddity, and follow more breadcrumbs, we arrive at
> kvm_arch_supports_gmem_mmap(), where it unnecessarily open codes a check on
> KVM_X86_DEFAULT_VM when in fact the real restriction is that guest_memfd mmap()
> is currently incompatible with kvm_arch_has_private_mem().
>
> I already have a NAK typed up for patch 3 for completely unrelated reasons (adding
> arch.supports_gmem creates unnecessary potential for bugs, e.g. allows checking
> kvm_arch_supports_gmem() before the flag is set).  That's all the more reason to
> kill off as many of these #defines and checks as possible.
>
> Oh, and that also ties into Xiaoyao's question about what to do with mapping
> guest_memfd into a memslot without a guest_memfd file descriptor.  Once we add
> private vs. shared tracking in guest_memfd, kvm_arch_supports_gmem_mmap() becomes
> true if CONFIG_KVM_GUEST_MEMFD=y.
>
> Heh, so going through all of that, KVM_PRIVATE_MEM just ends up being this:
>
> config KVM_PRIVATE_MEM
>         depends on X86_64
>         select KVM_GENERIC_MEMORY_ATTRIBUTES
>         bool
>
> which means my initial feedback that prompted this becomes null and void :-)

:)

> That said, I think we should take this opportunity to select KVM_GENERIC_MEMORY_ATTRIBUTES
> directly instead of having it selected from "config KVM".  There's a similar
> oddity with TDX.
>
> > improves clarity for developers and ensures the name accurately reflects
> > the functionality it controls, especially as guest_memfd support expands
> > beyond purely "private" memory scenarios.
> >
> > Note that the vm type KVM_X86_SW_PROTECTED_VM does not need the populate
> > function. Therefore, ensure that the correct configuration is selected
> > when KVM_SW_PROTECTED_VM is enabled.
> >
> > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Reviewed-by: Shivank Garg <shivankg@amd.com>
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > Co-developed-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/x86/kvm/Kconfig     | 7 ++++---
> >  include/linux/kvm_host.h | 2 +-
> >  virt/kvm/Kconfig         | 2 +-
> >  virt/kvm/guest_memfd.c   | 2 +-
> >  4 files changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> > index 2eeffcec5382..12e723bb76cc 100644
> > --- a/arch/x86/kvm/Kconfig
> > +++ b/arch/x86/kvm/Kconfig
> > @@ -46,7 +46,8 @@ config KVM_X86
> >       select HAVE_KVM_PM_NOTIFIER if PM
> >       select KVM_GENERIC_HARDWARE_ENABLING
> >       select KVM_GENERIC_PRE_FAULT_MEMORY
> > -     select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
> > +     select KVM_GMEM if KVM_SW_PROTECTED_VM
> > +     select KVM_GENERIC_MEMORY_ATTRIBUTES if KVM_SW_PROTECTED_VM
> >       select KVM_WERROR if WERROR
> >
> >  config KVM
> > @@ -95,7 +96,7 @@ config KVM_SW_PROTECTED_VM
> >  config KVM_INTEL
> >       tristate "KVM for Intel (and compatible) processors support"
> >       depends on KVM && IA32_FEAT_CTL
> > -     select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
> > +     select KVM_GENERIC_GMEM_POPULATE if INTEL_TDX_HOST
> >       select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
> >       help
> >         Provides support for KVM on processors equipped with Intel's VT
> > @@ -157,7 +158,7 @@ config KVM_AMD_SEV
> >       depends on KVM_AMD && X86_64
> >       depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
> >       select ARCH_HAS_CC_PLATFORM
> > -     select KVM_GENERIC_PRIVATE_MEM
> > +     select KVM_GENERIC_GMEM_POPULATE
> >       select HAVE_KVM_ARCH_GMEM_PREPARE
> >       select HAVE_KVM_ARCH_GMEM_INVALIDATE
> >       help
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 755b09dcafce..359baaae5e9f 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2556,7 +2556,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
> >  int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
> >  #endif
> >
> > -#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
> > +#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
> >  /**
> >   * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
> >   *
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index 49df4e32bff7..559c93ad90be 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -116,7 +116,7 @@ config KVM_GMEM
> >         select XARRAY_MULTI
> >         bool
> >
> > -config KVM_GENERIC_PRIVATE_MEM
> > +config KVM_GENERIC_GMEM_POPULATE
> >         select KVM_GENERIC_MEMORY_ATTRIBUTES
> >         select KVM_GMEM
>
> This is where things really start to break down.  Selecting KVM_GUEST_MEMFD and
> KVM_GENERIC_MEMORY_ATTRIBUTES when KVM_GENERIC_PRIVATE_MEM=y is decent logic.
> *Selecting* KVM_GUEST_MEMFD from a sub-feature of guest_memfd is weird.
>
> I don't love HAVE_KVM_ARCH_GMEM_INVALIDATE and HAVE_KVM_ARCH_GMEM_PREPARE, as I
> think they're too fine-grained.  But that's largely an orthogonal problem, and
> it's not clear that bundling them together would be an improvement.  So, I think
> we should just follow those and add HAVE_KVM_ARCH_GMEM_POPULATE, selected by SEV
> and TDX.
>
> The below diff applies on top.  I'm guessing there may be some intermediate
> ugliness (I haven't mapped out exactly where/how to squash this throughout the
> series, and there is feedback relevant to future patches), but IMO this is a much
> cleaner resting state (see the diff stats).

So just so that I am clear, applying the diff below to the appropriate
patches would address all the concerns that you have mentioned in this
email?

Thanks,
/fuad

> ---
>  arch/arm64/include/asm/kvm_host.h |  5 -----
>  arch/arm64/kvm/Kconfig            |  3 +--
>  arch/x86/include/asm/kvm_host.h   | 15 +-------------
>  arch/x86/kvm/Kconfig              | 10 +++++----
>  arch/x86/kvm/x86.c                | 13 ++++++++++--
>  include/linux/kvm_host.h          | 34 +++++--------------------------
>  virt/kvm/Kconfig                  | 11 +++-------
>  virt/kvm/guest_memfd.c            | 10 +++++----
>  virt/kvm/kvm_main.c               |  8 +++-----
>  9 files changed, 36 insertions(+), 73 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 63f7827cfa1b..3408174ec945 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1674,9 +1674,4 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt);
>  void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1);
>  void check_feature_map(void);
>
> -#ifdef CONFIG_KVM_GMEM
> -#define kvm_arch_supports_gmem(kvm) true
> -#define kvm_arch_supports_gmem_mmap(kvm) IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP)
> -#endif
> -
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 323b46b7c82f..bff62e75d681 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -37,8 +37,7 @@ menuconfig KVM
>         select HAVE_KVM_VCPU_RUN_PID_CHANGE
>         select SCHED_INFO
>         select GUEST_PERF_EVENTS if PERF_EVENTS
> -       select KVM_GMEM
> -       select KVM_GMEM_SUPPORTS_MMAP
> +       select KVM_GUEST_MEMFD
>         help
>           Support hosting virtualized guest machines.
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e1426adfa93e..d93560769465 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2276,21 +2276,8 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>                        int tdp_max_root_level, int tdp_huge_page_level);
>
>
> -#ifdef CONFIG_KVM_GMEM
> +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
>  #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
> -#define kvm_arch_supports_gmem(kvm)  ((kvm)->arch.supports_gmem)
> -
> -/*
> - * CoCo VMs with hardware support that use guest_memfd only for backing private
> - * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
> - */
> -#define kvm_arch_supports_gmem_mmap(kvm)               \
> -       (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&   \
> -        (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
> -#else
> -#define kvm_arch_has_private_mem(kvm) false
> -#define kvm_arch_supports_gmem(kvm) false
> -#define kvm_arch_supports_gmem_mmap(kvm) false
>  #endif
>
>  #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 2eeffcec5382..afcf8628f615 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -46,8 +46,8 @@ config KVM_X86
>         select HAVE_KVM_PM_NOTIFIER if PM
>         select KVM_GENERIC_HARDWARE_ENABLING
>         select KVM_GENERIC_PRE_FAULT_MEMORY
> -       select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
>         select KVM_WERROR if WERROR
> +       select KVM_GUEST_MEMFD if X86_64
>
>  config KVM
>         tristate "Kernel-based Virtual Machine (KVM) support"
> @@ -84,6 +84,7 @@ config KVM_SW_PROTECTED_VM
>         bool "Enable support for KVM software-protected VMs"
>         depends on EXPERT
>         depends on KVM && X86_64
> +       select KVM_GENERIC_MEMORY_ATTRIBUTES
>         help
>           Enable support for KVM software-protected VMs.  Currently, software-
>           protected VMs are purely a development and testing vehicle for
> @@ -95,8 +96,6 @@ config KVM_SW_PROTECTED_VM
>  config KVM_INTEL
>         tristate "KVM for Intel (and compatible) processors support"
>         depends on KVM && IA32_FEAT_CTL
> -       select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
> -       select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
>         help
>           Provides support for KVM on processors equipped with Intel's VT
>           extensions, a.k.a. Virtual Machine Extensions (VMX).
> @@ -135,6 +134,8 @@ config KVM_INTEL_TDX
>         bool "Intel Trust Domain Extensions (TDX) support"
>         default y
>         depends on INTEL_TDX_HOST
> +       select KVM_GENERIC_MEMORY_ATTRIBUTES
> +       select HAVE_KVM_ARCH_GMEM_POPULATE
>         help
>           Provides support for launching Intel Trust Domain Extensions (TDX)
>           confidential VMs on Intel processors.
> @@ -157,9 +158,10 @@ config KVM_AMD_SEV
>         depends on KVM_AMD && X86_64
>         depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
>         select ARCH_HAS_CC_PLATFORM
> -       select KVM_GENERIC_PRIVATE_MEM
> +       select KVM_GENERIC_MEMORY_ATTRIBUTES
>         select HAVE_KVM_ARCH_GMEM_PREPARE
>         select HAVE_KVM_ARCH_GMEM_INVALIDATE
> +       select HAVE_KVM_ARCH_GMEM_POPULATE
>         help
>           Provides support for launching encrypted VMs which use Secure
>           Encrypted Virtualization (SEV), Secure Encrypted Virtualization with
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ca99187a566e..b6961b4b7aee 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12781,8 +12781,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>
>         kvm->arch.vm_type = type;
>         kvm->arch.has_private_mem = (type == KVM_X86_SW_PROTECTED_VM);
> -       kvm->arch.supports_gmem =
> -               type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
>         /* Decided by the vendor code for other VM types.  */
>         kvm->arch.pre_fault_allowed =
>                 type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
> @@ -13708,6 +13706,16 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
>
> +#ifdef CONFIG_KVM_GUEST_MEMFD
> +/*
> + * KVM doesn't yet support mmap() on guest_memfd for VMs with private memory
> + * (the private vs. shared tracking needs to be moved into guest_memfd).
> + */
> +bool kvm_arch_supports_gmem_mmap(struct kvm *kvm)
> +{
> +       return !kvm_arch_has_private_mem(kvm);
> +}
> +
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
>  int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
>  {
> @@ -13721,6 +13729,7 @@ void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
>         kvm_x86_call(gmem_invalidate)(start, end);
>  }
>  #endif
> +#endif
>
>  int kvm_spec_ctrl_test_value(u64 value)
>  {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2c1dcd3967d9..a9f31b2b63b1 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -719,39 +719,15 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
>  }
>  #endif
>
> -/*
> - * Arch code must define kvm_arch_has_private_mem if support for guest_memfd is
> - * enabled.
> - */
> -#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> +#ifndef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
>  static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
>  {
>         return false;
>  }
>  #endif
>
> -/*
> - * Arch code must define kvm_arch_supports_gmem if support for guest_memfd is
> - * enabled.
> - */
> -#if !defined(kvm_arch_supports_gmem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> -static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> -{
> -       return false;
> -}
> -#endif
> -
> -/*
> - * Returns true if this VM supports mmap() in guest_memfd.
> - *
> - * Arch code must define kvm_arch_supports_gmem_mmap if support for guest_memfd
> - * is enabled.
> - */
> -#if !defined(kvm_arch_supports_gmem_mmap)
> -static inline bool kvm_arch_supports_gmem_mmap(struct kvm *kvm)
> -{
> -       return false;
> -}
> +#ifdef CONFIG_KVM_GUEST_MEMFD
> +bool kvm_arch_supports_gmem_mmap(struct kvm *kvm);
>  #endif
>
>  #ifndef kvm_arch_has_readonly_mem
> @@ -2539,7 +2515,7 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
>
>  static inline bool kvm_memslot_is_gmem_only(const struct kvm_memory_slot *slot)
>  {
> -       if (!IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP))
> +       if (!IS_ENABLED(CONFIG_KVM_GUEST_MEMFD))
>                 return false;
>
>         return slot->flags & KVM_MEMSLOT_GMEM_ONLY;
> @@ -2596,7 +2572,7 @@ static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
>  int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
>  #endif
>
> -#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE
>  /**
>   * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
>   *
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 96cf4ab0d534..9d472f46ebf1 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -112,15 +112,10 @@ config KVM_GENERIC_MEMORY_ATTRIBUTES
>         depends on KVM_GENERIC_MMU_NOTIFIER
>         bool
>
> -config KVM_GMEM
> +config KVM_GUEST_MEMFD
>         select XARRAY_MULTI
>         bool
>
> -config KVM_GENERIC_PRIVATE_MEM
> -       select KVM_GENERIC_MEMORY_ATTRIBUTES
> -       select KVM_GMEM
> -       bool
> -
>  config HAVE_KVM_ARCH_GMEM_PREPARE
>         bool
>         depends on KVM_GMEM
> @@ -129,6 +124,6 @@ config HAVE_KVM_ARCH_GMEM_INVALIDATE
>         bool
>         depends on KVM_GMEM
>
> -config KVM_GMEM_SUPPORTS_MMAP
> -       select KVM_GMEM
> +config HAVE_KVM_ARCH_GMEM_POPULATE
>         bool
> +       depends on KVM_GMEM
> \ No newline at end of file
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index d01bd7a2c2bd..57db0041047a 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -316,9 +316,6 @@ static bool kvm_gmem_supports_mmap(struct inode *inode)
>  {
>         const u64 flags = (u64)inode->i_private;
>
> -       if (!IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP))
> -               return false;
> -
>         return flags & GUEST_MEMFD_FLAG_MMAP;
>  }
>
> @@ -527,6 +524,11 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
>         return err;
>  }
>
> +bool __weak kvm_arch_supports_gmem_mmap(struct kvm *kvm)
> +{
> +       return true;
> +}
> +
>  int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>  {
>         loff_t size = args->size;
> @@ -730,7 +732,7 @@ int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn)
>  }
>  EXPORT_SYMBOL_GPL(kvm_gmem_mapping_order);
>
> -#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE
>  long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
>                        kvm_gmem_populate_cb post_populate, void *opaque)
>  {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f1ac872e01e9..1b609e35303f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1588,7 +1588,7 @@ static int check_memory_region_flags(struct kvm *kvm,
>  {
>         u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
>
> -       if (kvm_arch_supports_gmem(kvm))
> +       if (IS_ENABLED(CONFIG_KVM_GUEST_MEMFD))
>                 valid_flags |= KVM_MEM_GUEST_MEMFD;
>
>         /* Dirty logging private memory is not currently supported. */
> @@ -4915,10 +4915,8 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  #endif
>  #ifdef CONFIG_KVM_GMEM
>         case KVM_CAP_GUEST_MEMFD:
> -               return !kvm || kvm_arch_supports_gmem(kvm);
> -#endif
> -#ifdef CONFIG_KVM_GMEM_SUPPORTS_MMAP
> -       case KVM_CAP_GMEM_MMAP:
> +               return 1;
> +       case KVM_CAP_GUEST_MEMFD_MMAP:
>                 return !kvm || kvm_arch_supports_gmem_mmap(kvm);
>  #endif
>         default:
>
> base-commit: 9eba3a9ac9cd5922da7f6e966c01190f909ed640
> --


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-07-21 16:47   ` Sean Christopherson
@ 2025-07-21 16:56     ` Fuad Tabba
  2025-07-22  5:41     ` Xiaoyao Li
  1 sibling, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-21 16:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Mon, 21 Jul 2025 at 17:47, Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Jul 17, 2025, Fuad Tabba wrote:
> > From: Ackerley Tng <ackerleytng@google.com>
> >
> > Update the KVM MMU fault handler to service guest page faults
> > for memory slots backed by guest_memfd with mmap support. For such
> > slots, the MMU must always fault in pages directly from guest_memfd,
> > bypassing the host's userspace_addr.
> >
> > This ensures that guest_memfd-backed memory is always handled through
> > the guest_memfd specific faulting path, regardless of whether it's for
> > private or non-private (shared) use cases.
> >
> > Additionally, rename kvm_mmu_faultin_pfn_private() to
> > kvm_mmu_faultin_pfn_gmem(), as this function is now used to fault in
> > pages from guest_memfd for both private and non-private memory,
> > accommodating the new use cases.
> >
> > Co-developed-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Co-developed-by: Fuad Tabba <tabba@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 94be15cde6da..ad5f337b496c 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4511,8 +4511,8 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
> >                                r == RET_PF_RETRY, fault->map_writable);
> >  }
> >
> > -static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > -                                    struct kvm_page_fault *fault)
> > +static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> > +                                 struct kvm_page_fault *fault)
> >  {
> >       int max_order, r;
> >
> > @@ -4536,13 +4536,18 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> >       return RET_PF_CONTINUE;
> >  }
> >
> > +static bool fault_from_gmem(struct kvm_page_fault *fault)
>
> Drop the helper.  It has exactly one caller, and it makes the code *harder* to
> read, e.g. raises the question of what "from gmem" even means.  If a separate
> series follows and needs/justifies this helper, then it can/should be added then.

Ack.

Cheers,
/fuad

> > +{
> > +     return fault->is_private || kvm_memslot_is_gmem_only(fault->slot);
> > +}
> > +
> >  static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> >                                struct kvm_page_fault *fault)
> >  {
> >       unsigned int foll = fault->write ? FOLL_WRITE : 0;
> >
> > -     if (fault->is_private)
> > -             return kvm_mmu_faultin_pfn_private(vcpu, fault);
> > +     if (fault_from_gmem(fault))
> > +             return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
> >
> >       foll |= FOLL_NOWAIT;
> >       fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
> > --
> > 2.50.0.727.gbf7dc18ff4-goog
> >


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 04/21] KVM: x86: Introduce kvm->arch.supports_gmem
  2025-07-21 16:45   ` Sean Christopherson
@ 2025-07-21 17:00     ` Fuad Tabba
  2025-07-21 19:09       ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-21 17:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

Hi Sean,

On Mon, 21 Jul 2025 at 17:45, Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Jul 17, 2025, Fuad Tabba wrote:
> > Introduce a new boolean member, supports_gmem, to kvm->arch.
> >
> > Previously, the has_private_mem boolean within kvm->arch was implicitly
> > used to indicate whether guest_memfd was supported for a KVM instance.
> > However, with the broader support for guest_memfd, it's not exclusively
> > for private or confidential memory. Therefore, it's necessary to
> > distinguish between a VM's general guest_memfd capabilities and its
> > support for private memory.
> >
> > This new supports_gmem member will now explicitly indicate guest_memfd
> > support for a given VM, allowing has_private_mem to represent only
> > support for private memory.
> >
> > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Reviewed-by: Shivank Garg <shivankg@amd.com>
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > Co-developed-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
>
> NAK, this introduces unnecessary potential for bugs, e.g. KVM will get a false
> negative if kvm_arch_supports_gmem() is invoked before kvm_x86_ops.vm_init().
>
> Patch 2 makes this a moot point because kvm_arch_supports_gmem() can simply go away.

Just to reiterate, this is a NAK to the whole patch (which if I recall
correctly, you had suggested), since the newer patch that you propose
makes this patch, and the function kvm_arch_supports_gmem()
unnecessary.

Fewer patches is fine by me :)

Thanks,
/fuad

> > ---
> >  arch/x86/include/asm/kvm_host.h | 3 ++-
> >  arch/x86/kvm/svm/svm.c          | 1 +
> >  arch/x86/kvm/vmx/tdx.c          | 1 +
> >  arch/x86/kvm/x86.c              | 4 ++--
> >  4 files changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index bde811b2d303..938b5be03d33 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1348,6 +1348,7 @@ struct kvm_arch {
> >       u8 mmu_valid_gen;
> >       u8 vm_type;
> >       bool has_private_mem;
> > +     bool supports_gmem;
> >       bool has_protected_state;
> >       bool pre_fault_allowed;
> >       struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
> > @@ -2277,7 +2278,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> >
> >  #ifdef CONFIG_KVM_GMEM
> >  #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
> > -#define kvm_arch_supports_gmem(kvm) kvm_arch_has_private_mem(kvm)
> > +#define kvm_arch_supports_gmem(kvm)  ((kvm)->arch.supports_gmem)
> >  #else
> >  #define kvm_arch_has_private_mem(kvm) false
> >  #define kvm_arch_supports_gmem(kvm) false
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index ab9b947dbf4f..d1c484eaa8ad 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -5181,6 +5181,7 @@ static int svm_vm_init(struct kvm *kvm)
> >               to_kvm_sev_info(kvm)->need_init = true;
> >
> >               kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
> > +             kvm->arch.supports_gmem = (type == KVM_X86_SNP_VM);
> >               kvm->arch.pre_fault_allowed = !kvm->arch.has_private_mem;
> >       }
> >
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index f31ccdeb905b..a3db6df245ee 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -632,6 +632,7 @@ int tdx_vm_init(struct kvm *kvm)
> >
> >       kvm->arch.has_protected_state = true;
> >       kvm->arch.has_private_mem = true;
> > +     kvm->arch.supports_gmem = true;
> >       kvm->arch.disabled_quirks |= KVM_X86_QUIRK_IGNORE_GUEST_PAT;
> >
> >       /*
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 357b9e3a6cef..adbdc2cc97d4 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12780,8 +12780,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >               return -EINVAL;
> >
> >       kvm->arch.vm_type = type;
> > -     kvm->arch.has_private_mem =
> > -             (type == KVM_X86_SW_PROTECTED_VM);
> > +     kvm->arch.has_private_mem = (type == KVM_X86_SW_PROTECTED_VM);
> > +     kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
> >       /* Decided by the vendor code for other VM types.  */
> >       kvm->arch.pre_fault_allowed =
> >               type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
> > --
> > 2.50.0.727.gbf7dc18ff4-goog
> >


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 15:07         ` Xiaoyao Li
@ 2025-07-21 17:29           ` Sean Christopherson
  2025-07-21 20:33             ` Vishal Annapurve
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 17:29 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Vishal Annapurve, Fuad Tabba, kvm, linux-arm-msm, linux-mm,
	kvmarm, pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer,
	aou, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On Mon, Jul 21, 2025, Xiaoyao Li wrote:
> On 7/21/2025 10:42 PM, Sean Christopherson wrote:
> > > > However, it fails actually, because the kvm_arch_suports_gmem_mmap()
> > > > returns false for TDX VMs, which means userspace cannot allocate gmem
> > > > with mmap just for shared memory for TDX.
> > > 
> > > Why do you want such a usecase to work?
> > 
> > I'm guessing Xiaoyao was asking an honest question in response to finding a
> > perceived flaw when trying to get this all working in QEMU.
> 
> I'm not sure if it is an flaw. Such usecase is not supported is just
> anti-intuition to me.
> 
> > > If kvm allows mappable guest_memfd files for TDX VMs without
> > > conversion support, userspace will be able to use those for backing
> > 
> > s/able/unable?
> 
> I think vishal meant "able", because ...
> 
> > > private memory unless:
> > > 1) KVM checks at binding time if the guest_memfd passed during memslot
> > > creation is not a mappable one and doesn't enforce "not mappable"
> > > requirement for TDX VMs at creation time.
> > 
> > Xiaoyao's question is about "just for shared memory", so this is irrelevant for
> > the question at hand.
> 
> ... if we allow gmem mmap for TDX, KVM needs to ensure the mmapable gmem
> should only be passed via userspace_addr. IOW, KVM needs to forbid userspace
> from passing the mmap'able guest_memfd to
> kvm_userspace_memory_region2.guest_memfd. Because it allows userspace to
> access the private mmeory.

TDX support needs to be gated (and is gated) on private vs. shared being tracked
in guest_memfd.  And that restriction should be (and is) reflected in
KVM_CAP_GUEST_MEMFD_MMAP when invoked on a VM (versus on /dev/kvm).

> 
> > > 2) KVM fetches shared faults through userspace page tables and not
> > > guest_memfd directly.
> > 
> > This is also irrelevant.  KVM _already_ supports resolving shared faults through
> > userspace page tables.  That support won't go away as KVM will always need/want
> > to support mapping VM_IO and/or VM_PFNMAP memory into the guest (even for TDX).
> > 
> > > I don't see value in trying to go out of way to support such a usecase.
> > 
> > But if/when KVM gains support for tracking shared vs. private in guest_memfd
> > itself, i.e. when TDX _does_ support mmap() on guest_memfd, KVM won't have to go
> > out of its to support using guest_memfd for the @userspace_addr backing store.
> > Unless I'm missing something, the only thing needed to "support" this scenario is:
> 
> As above, we need 1) mentioned by Vishal as well, to prevent userspace from
> passing mmapable guest_memfd to serve as private memory.

Ya, I'm talking specifically about what the world will look like once KVM tracks
private vs. shared in guest_memfd.  I'm not in any way advocating we do this
right now.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 19/21] KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP
  2025-07-17 16:27 ` [PATCH v15 19/21] KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP Fuad Tabba
  2025-07-18  6:14   ` Xiaoyao Li
@ 2025-07-21 17:31   ` Sean Christopherson
  1 sibling, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 17:31 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jul 17, 2025, Fuad Tabba wrote:
> Introduce the new KVM capability KVM_CAP_GMEM_MMAP. This capability
> signals to userspace that a KVM instance supports host userspace mapping
> of guest_memfd-backed memory.
> 
> The availability of this capability is determined per architecture, and
> its enablement for a specific guest_memfd instance is controlled by the
> GUEST_MEMFD_FLAG_MMAP flag at creation time.
> 
> Update the KVM API documentation to detail the KVM_CAP_GMEM_MMAP
> capability, the associated GUEST_MEMFD_FLAG_MMAP, and provide essential
> information regarding support for mmap in guest_memfd.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 9 +++++++++
>  include/uapi/linux/kvm.h       | 1 +
>  virt/kvm/kvm_main.c            | 4 ++++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 43ed57e048a8..5169066b53b2 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6407,6 +6407,15 @@ most one mapping per page, i.e. binding multiple memory regions to a single
>  guest_memfd range is not allowed (any number of memory regions can be bound to
>  a single guest_memfd file, but the bound ranges must not overlap).
>  
> +When the capability KVM_CAP_GMEM_MMAP is supported, the 'flags' field supports
> +GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation enables mmap()
> +and faulting of guest_memfd memory to host userspace.
> +
> +When the KVM MMU performs a PFN lookup to service a guest fault and the backing
> +guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
> +consumed from guest_memfd, regardless of whether it is a shared or a private
> +fault.
> +
>  See KVM_SET_USER_MEMORY_REGION2 for additional details.
>  
>  4.143 KVM_PRE_FAULT_MEMORY
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 3beafbf306af..698dd407980f 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -960,6 +960,7 @@ struct kvm_enable_cap {
>  #define KVM_CAP_ARM_EL2 240
>  #define KVM_CAP_ARM_EL2_E2H0 241
>  #define KVM_CAP_RISCV_MP_STATE_RESET 242
> +#define KVM_CAP_GMEM_MMAP 243

KVM_CAP_GUEST_MEMFD_MMAP please.  I definitely don't want "gmem" in any of the
uAPI.

>  struct kvm_irq_routing_irqchip {
>  	__u32 irqchip;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 46bddac1dacd..f1ac872e01e9 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4916,6 +4916,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  #ifdef CONFIG_KVM_GMEM
>  	case KVM_CAP_GUEST_MEMFD:
>  		return !kvm || kvm_arch_supports_gmem(kvm);
> +#endif
> +#ifdef CONFIG_KVM_GMEM_SUPPORTS_MMAP
> +	case KVM_CAP_GMEM_MMAP:

As alluded to in my feedback in patch 2, this should be inside CONFIG_KVM_GUEST_MEMFD.

> +		return !kvm || kvm_arch_supports_gmem_mmap(kvm);
>  #endif
>  	default:
>  		break;
> -- 
> 2.50.0.727.gbf7dc18ff4-goog
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-07-21 16:51     ` Fuad Tabba
@ 2025-07-21 17:33       ` Sean Christopherson
  2025-07-22  9:29         ` Fuad Tabba
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 17:33 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Mon, Jul 21, 2025, Fuad Tabba wrote:
> > The below diff applies on top.  I'm guessing there may be some intermediate
> > ugliness (I haven't mapped out exactly where/how to squash this throughout the
> > series, and there is feedback relevant to future patches), but IMO this is a much
> > cleaner resting state (see the diff stats).
> 
> So just so that I am clear, applying the diff below to the appropriate
> patches would address all the concerns that you have mentioned in this
> email?

Yes?  It should, I just don't want to pinky swear in case I botched something.

But goofs aside, yes, if the end result looks like what was the below, I'm happy.
Again, things might get ugly in the process, i.e. might be temporariliy gross,
but that's ok (within reason).


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 04/21] KVM: x86: Introduce kvm->arch.supports_gmem
  2025-07-21 17:00     ` Fuad Tabba
@ 2025-07-21 19:09       ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 19:09 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Mon, Jul 21, 2025, Fuad Tabba wrote:
> Hi Sean,
> 
> On Mon, 21 Jul 2025 at 17:45, Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Jul 17, 2025, Fuad Tabba wrote:
> > > Introduce a new boolean member, supports_gmem, to kvm->arch.
> > >
> > > Previously, the has_private_mem boolean within kvm->arch was implicitly
> > > used to indicate whether guest_memfd was supported for a KVM instance.
> > > However, with the broader support for guest_memfd, it's not exclusively
> > > for private or confidential memory. Therefore, it's necessary to
> > > distinguish between a VM's general guest_memfd capabilities and its
> > > support for private memory.
> > >
> > > This new supports_gmem member will now explicitly indicate guest_memfd
> > > support for a given VM, allowing has_private_mem to represent only
> > > support for private memory.
> > >
> > > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > > Reviewed-by: Shivank Garg <shivankg@amd.com>
> > > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > > Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Co-developed-by: David Hildenbrand <david@redhat.com>
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > Signed-off-by: Fuad Tabba <tabba@google.com>
> >
> > NAK, this introduces unnecessary potential for bugs, e.g. KVM will get a false
> > negative if kvm_arch_supports_gmem() is invoked before kvm_x86_ops.vm_init().
> >
> > Patch 2 makes this a moot point because kvm_arch_supports_gmem() can simply go away.
> 
> Just to reiterate, this is a NAK to the whole patch

Ya, in effect.  Well, more specifically to adding arch.supports_gmem, not to the
idea of support guest_memfd broadly.

> (which if I recall correctly, you had suggested),

Sort of[*].  In that thread, I was reacting to the (ab)use of has_private_mem.
And FWIW, I was envisioning supports_gmem being set in common x86.c super early
on, though what pushed me into NAK territory was seeing the final usage, where
"optimizing" kvm_arch_supports_gmem() isn't worth any amount of complexity.

 : And then rather than rename has_private_mem, either add supports_gmem or do what
 : you did for kvm_arch_supports_gmem_shared_mem() and explicitly check the VM type.

[*] https://lore.kernel.org/all/aEyLlbyMmNEBCAVj@google.com

> since the newer patch that you propose makes this patch, and the function
> kvm_arch_supports_gmem() unnecessary.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 10/21] KVM: x86/mmu: Generalize private_max_mapping_level x86 op to max_mapping_level
  2025-07-17 16:27 ` [PATCH v15 10/21] KVM: x86/mmu: Generalize private_max_mapping_level x86 op to max_mapping_level Fuad Tabba
  2025-07-18  6:19   ` Xiaoyao Li
@ 2025-07-21 19:46   ` Sean Christopherson
  1 sibling, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 19:46 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Thu, Jul 17, 2025, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> Generalize the private_max_mapping_level x86 operation to
> max_mapping_level.

No, this is wrong.  The "private" part can be dropped, but it must not be a fully
generic helper, it needs to be limited to gmem.  For mappings that follow the
primary MMU, the primary MMU is the single source of truth.  It's not just a
nitpick, I got royally confused by the name when looking at the next patch.

s/private_max_mapping_level/gmem_max_mapping_level and we're good.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 17:29           ` Sean Christopherson
@ 2025-07-21 20:33             ` Vishal Annapurve
  2025-07-21 22:21               ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Vishal Annapurve @ 2025-07-21 20:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Mon, Jul 21, 2025 at 10:29 AM Sean Christopherson <seanjc@google.com> wrote:
>
> >
> > > > 2) KVM fetches shared faults through userspace page tables and not
> > > > guest_memfd directly.
> > >
> > > This is also irrelevant.  KVM _already_ supports resolving shared faults through
> > > userspace page tables.  That support won't go away as KVM will always need/want
> > > to support mapping VM_IO and/or VM_PFNMAP memory into the guest (even for TDX).

As a combination of [1] and [2], I believe we are saying that for
memslots backed by mappable guest_memfd files, KVM will always serve
both shared/private faults using kvm_gmem_get_pfn(). And I think the
same story will be carried over when we get the stage2 i.e.
mmap+conversion support.

[1] https://lore.kernel.org/kvm/20250717162731.446579-10-tabba@google.com/
[2] https://lore.kernel.org/kvm/20250717162731.446579-14-tabba@google.com/

> > >
> > > > I don't see value in trying to go out of way to support such a usecase.
> > >
> > > But if/when KVM gains support for tracking shared vs. private in guest_memfd
> > > itself, i.e. when TDX _does_ support mmap() on guest_memfd, KVM won't have to go
> > > out of its to support using guest_memfd for the @userspace_addr backing store.
> > > Unless I'm missing something, the only thing needed to "support" this scenario is:
> >
> > As above, we need 1) mentioned by Vishal as well, to prevent userspace from
> > passing mmapable guest_memfd to serve as private memory.
>
> Ya, I'm talking specifically about what the world will look like once KVM tracks
> private vs. shared in guest_memfd.  I'm not in any way advocating we do this
> right now.

I think we should generally strive to go towards single memory backing
for all the scenarios, unless there is a real world usecase that can't
do without dual memory backing (We should think hard before committing
to supporting it).

Dual memory backing was just a stopgap we needed until the *right*
solution came along.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 20:33             ` Vishal Annapurve
@ 2025-07-21 22:21               ` Sean Christopherson
  2025-07-21 23:50                 ` Vishal Annapurve
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 22:21 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: Xiaoyao Li, Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Mon, Jul 21, 2025, Vishal Annapurve wrote:
> On Mon, Jul 21, 2025 at 10:29 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > >
> > > > > 2) KVM fetches shared faults through userspace page tables and not
> > > > > guest_memfd directly.
> > > >
> > > > This is also irrelevant.  KVM _already_ supports resolving shared faults through
> > > > userspace page tables.  That support won't go away as KVM will always need/want
> > > > to support mapping VM_IO and/or VM_PFNMAP memory into the guest (even for TDX).
> 
> As a combination of [1] and [2], I believe we are saying that for
> memslots backed by mappable guest_memfd files, KVM will always serve
> both shared/private faults using kvm_gmem_get_pfn(). 

No, KVM can't guarantee that with taking and holding mmap_lock across hva_to_pfn(),
and as I mentioned earlier in the thread, that's a non-starter for me.

For a memslot without a valid slot->gmem.file, slot->userspace_addr will be used
to resolve faults and access guest memory.  By design, KVM has no knowledge of
what lies behind userspace_addr (arm64 and other architectures peek at the VMA,
but x86 does not).  So we can't say that mmap()'d guest_memfd instance will *only*
go through kvm_gmem_get_pfn().


> And I think the same story will be carried over when we get the stage2 i.e.
> mmap+conversion support.
> 
> [1] https://lore.kernel.org/kvm/20250717162731.446579-10-tabba@google.com/
> [2] https://lore.kernel.org/kvm/20250717162731.446579-14-tabba@google.com/
> 
> > > >
> > > > > I don't see value in trying to go out of way to support such a usecase.
> > > >
> > > > But if/when KVM gains support for tracking shared vs. private in guest_memfd
> > > > itself, i.e. when TDX _does_ support mmap() on guest_memfd, KVM won't have to go
> > > > out of its to support using guest_memfd for the @userspace_addr backing store.
> > > > Unless I'm missing something, the only thing needed to "support" this scenario is:
> > >
> > > As above, we need 1) mentioned by Vishal as well, to prevent userspace from
> > > passing mmapable guest_memfd to serve as private memory.
> >
> > Ya, I'm talking specifically about what the world will look like once KVM tracks
> > private vs. shared in guest_memfd.  I'm not in any way advocating we do this
> > right now.
> 
> I think we should generally strive to go towards single memory backing
> for all the scenarios, unless there is a real world usecase that can't
> do without dual memory backing (We should think hard before committing
> to supporting it).
> 
> Dual memory backing was just a stopgap we needed until the *right*
> solution came along.

I don't think anyone is suggesting otherwise.  I'm just pointing out that what
Xiaoyao is trying to do should Just Work once KVM allows creating mmap()-friendly
guest_memfd instances for TDX.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level
  2025-07-18  5:10   ` Xiaoyao Li
@ 2025-07-21 23:17     ` Sean Christopherson
  2025-07-22  5:35       ` Xiaoyao Li
  2025-07-22 10:35       ` Fuad Tabba
  0 siblings, 2 replies; 86+ messages in thread
From: Sean Christopherson @ 2025-07-21 23:17 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

[-- Attachment #1: Type: text/plain, Size: 5325 bytes --]

On Fri, Jul 18, 2025, Xiaoyao Li wrote:
> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > From: Ackerley Tng <ackerleytng@google.com>
> > 
> > Refactor kvm_max_private_mapping_level() to accept a NULL kvm_page_fault
> > pointer and rename it to kvm_gmem_max_mapping_level().
> > 
> > The max_mapping_level x86 operation (previously private_max_mapping_level)
> > is designed to potentially be called without an active page fault, for
> > instance, when kvm_mmu_max_mapping_level() is determining the maximum
> > mapping level for a gfn proactively.
> > 
> > Allow NULL fault pointer: Modify kvm_max_private_mapping_level() to
> > safely handle a NULL fault argument. This aligns its interface with the
> > kvm_x86_ops.max_mapping_level operation it wraps, which can also be
> > called with NULL.
> 
> are you sure of it?
> 
> The patch 09 just added the check of fault->is_private for TDX and SEV.

+1, this isn't quite right.  That's largely my fault (no pun intended) though, as
I suggested the basic gist of the NULL @fault handling, and it's a mess.  More at
the bottom.

> > Rename function to kvm_gmem_max_mapping_level(): This reinforces that
> > the function's scope is for guest_memfd-backed memory, which can be
> > either private or non-private, removing any remaining "private"
> > connotation from its name.
> > 
> > Optimize max_level checks: Introduce a check in the caller to skip
> > querying for max_mapping_level if the current max_level is already
> > PG_LEVEL_4K, as no further reduction is possible.
> > 
> > Acked-by: David Hildenbrand <david@redhat.com>
> > Suggested-by: Sean Christoperson <seanjc@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >   arch/x86/kvm/mmu/mmu.c | 16 +++++++---------
> >   1 file changed, 7 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index bb925994cbc5..6bd28fda0fd3 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4467,17 +4467,13 @@ static inline u8 kvm_max_level_for_order(int order)
> >   	return PG_LEVEL_4K;
> >   }
> > -static u8 kvm_max_private_mapping_level(struct kvm *kvm,
> > -					struct kvm_page_fault *fault,
> > -					int gmem_order)
> > +static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
> > +				     struct kvm_page_fault *fault)
> >   {
> > -	u8 max_level = fault->max_level;
> >   	u8 req_max_level;
> > +	u8 max_level;
> > -	if (max_level == PG_LEVEL_4K)
> > -		return PG_LEVEL_4K;
> > -
> > -	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
> > +	max_level = kvm_max_level_for_order(order);
> >   	if (max_level == PG_LEVEL_4K)
> >   		return PG_LEVEL_4K;
> > @@ -4513,7 +4509,9 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> >   	}
> >   	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> > -	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault, max_order);
> > +	if (fault->max_level >= PG_LEVEL_4K)
> > +		fault->max_level = kvm_gmem_max_mapping_level(vcpu->kvm,
> > +							      max_order, fault);
> 
> I cannot understand why this change is required. In what case will
> fault->max_level < PG_LEVEL_4K?

Yeah, I don't get this code either.  I also don't think KVM should call
kvm_gmem_max_mapping_level() *here*.  That's mostly a problem with my suggested
NULL @fault handling.  Dealing with kvm_gmem_max_mapping_level() here leads to
weirdness, because kvm_gmem_max_mapping_level() also needs to be invoked for the
!fault path, and then we end up with multiple call sites and the potential for a
redundant call (gmem only, is private).

Looking through surrounding patches, the ordering of things is also "off".
"Generalize private_max_mapping_level x86 op to max_mapping_level" should just
rename the helper; reacting to !is_private memory in TDX belongs in "Consult
guest_memfd when computing max_mapping_level", because that's where KVM plays
nice with non-private memory.

But that patch is also doing too much, e.g. shuffling code around and short-circuting
the non-fault case, which makes it confusing and hard to review.  Extending gmem
hugepage support to shared memory should be "just" this:

@@ -3335,8 +3336,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
        if (max_level == PG_LEVEL_4K)
                return PG_LEVEL_4K;
 
-       if (is_private)
-               host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn);
+       if (is_private || kvm_memslot_is_gmem_only(slot))
+               host_level = kvm_gmem_max_mapping_level(kvm, fault, slot, gfn,
+                                                       is_private);
        else
                host_level = host_pfn_mapping_level(kvm, gfn, slot);
        return min(host_level, max_level);

plus the plumbing and the small TDX change.  All the renames and code shuffling
should be done in prep patches.

The attached patches are compile-tested only, but I think they get use where we
want to be, and without my confusing suggestion to try and punt on private mappings
in the hugepage recovery paths.  They should slot it at the right patch numbers
(relative to v15).

Holler if the patches don't work, I'm happy to help sort things out so that v16
is ready to go.

[-- Attachment #2: 0010-KVM-x86-mmu-Rename-.private_max_mapping_level-to-.gm.patch --]
[-- Type: text/x-diff, Size: 6889 bytes --]

From 10fc898f91ded6942f7db2c3b91acaaffd3a56ca Mon Sep 17 00:00:00 2001
From: Ackerley Tng <ackerleytng@google.com>
Date: Thu, 17 Jul 2025 17:27:20 +0100
Subject: [PATCH 10/23] KVM: x86/mmu: Rename .private_max_mapping_level() to
 .gmem_max_mapping_level()

Rename kvm_x86_ops.private_max_mapping_level() to .gmem_max_mapping_level()
in anticipation of extending guest_memfd support to non-private memory.

No functional change intended.

Suggested-by: Sean Christoperson <seanjc@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
[sean: rename only, rewrite changelog accordingly]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 2 +-
 arch/x86/include/asm/kvm_host.h    | 2 +-
 arch/x86/kvm/mmu/mmu.c             | 2 +-
 arch/x86/kvm/svm/sev.c             | 2 +-
 arch/x86/kvm/svm/svm.c             | 2 +-
 arch/x86/kvm/svm/svm.h             | 4 ++--
 arch/x86/kvm/vmx/main.c            | 6 +++---
 arch/x86/kvm/vmx/tdx.c             | 2 +-
 arch/x86/kvm/vmx/x86_ops.h         | 2 +-
 9 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 8d50e3e0a19b..17014d50681b 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -146,7 +146,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
-KVM_X86_OP_OPTIONAL_RET0(private_max_mapping_level)
+KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
 KVM_X86_OP_OPTIONAL(gmem_invalidate)
 
 #undef KVM_X86_OP
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 938b5be03d33..1569520e84d2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1907,7 +1907,7 @@ struct kvm_x86_ops {
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
-	int (*private_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn);
+	int (*gmem_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 213904daf1e5..c5919fca9870 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4479,7 +4479,7 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
-	req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn);
+	req_max_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn);
 	if (req_max_level)
 		max_level = min(max_level, req_max_level);
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 687392c5bf5d..81974ae2bc8c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4906,7 +4906,7 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
 	}
 }
 
-int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
 {
 	int level, rc;
 	bool assigned;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d1c484eaa8ad..477dc1e3c622 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5347,7 +5347,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.gmem_prepare = sev_gmem_prepare,
 	.gmem_invalidate = sev_gmem_invalidate,
-	.private_max_mapping_level = sev_private_max_mapping_level,
+	.gmem_max_mapping_level = sev_gmem_max_mapping_level,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e6f3c6a153a0..bd7445e0b521 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -787,7 +787,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
-int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
+int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
 struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
 void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa);
 #else
@@ -816,7 +816,7 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
 	return 0;
 }
 static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
-static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+static inline int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
 {
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d1e02e567b57..0c5b66edbf49 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -871,10 +871,10 @@ static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
 	return tdx_vcpu_ioctl(vcpu, argp);
 }
 
-static int vt_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+static int vt_gmem_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
 {
 	if (is_td(kvm))
-		return tdx_gmem_private_max_mapping_level(kvm, pfn);
+		return tdx_gmem_gmem_max_mapping_level(kvm, pfn);
 
 	return 0;
 }
@@ -1044,7 +1044,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.mem_enc_ioctl = vt_op_tdx_only(mem_enc_ioctl),
 	.vcpu_mem_enc_ioctl = vt_op_tdx_only(vcpu_mem_enc_ioctl),
 
-	.private_max_mapping_level = vt_op_tdx_only(gmem_private_max_mapping_level)
+	.gmem_max_mapping_level = vt_op_tdx_only(gmem_gmem_max_mapping_level)
 };
 
 struct kvm_x86_init_ops vt_init_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a3db6df245ee..d867a210eba0 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3322,7 +3322,7 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
 	return ret;
 }
 
-int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+int tdx_gmem_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
 {
 	return PG_LEVEL_4K;
 }
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index b4596f651232..26c6de3d775c 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -163,7 +163,7 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
 void tdx_flush_tlb_current(struct kvm_vcpu *vcpu);
 void tdx_flush_tlb_all(struct kvm_vcpu *vcpu);
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level);
-int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
+int tdx_gmem_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
 #endif
 
 #endif /* __KVM_X86_VMX_X86_OPS_H */
-- 
2.50.0.727.gbf7dc18ff4-goog


[-- Attachment #3: 0011-KVM-x86-mmu-Hoist-guest_memfd-max-level-order-helper.patch --]
[-- Type: text/x-diff, Size: 3220 bytes --]

From 2ff69e6ce989468cb0f86b85ecbc94e2316f0096 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 21 Jul 2025 14:30:51 -0700
Subject: [PATCH 11/23] KVM: x86/mmu: Hoist guest_memfd max level/order helpers
 "up" in mmu.c

Move kvm_max_level_for_order() and kvm_max_private_mapping_level() up in
mmu.c so that they can be used by __kvm_mmu_max_mapping_level().

Opportunistically drop the "inline" from kvm_max_level_for_order().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 72 +++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c5919fca9870..9a0c9b9473d9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3258,6 +3258,42 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return level;
 }
 
+static u8 kvm_max_level_for_order(int order)
+{
+	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
+
+	KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
+			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
+			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
+		return PG_LEVEL_1G;
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+		return PG_LEVEL_2M;
+
+	return PG_LEVEL_4K;
+}
+
+static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
+					u8 max_level, int gmem_order)
+{
+	u8 req_max_level;
+
+	if (max_level == PG_LEVEL_4K)
+		return PG_LEVEL_4K;
+
+	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
+	if (max_level == PG_LEVEL_4K)
+		return PG_LEVEL_4K;
+
+	req_max_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn);
+	if (req_max_level)
+		max_level = min(max_level, req_max_level);
+
+	return max_level;
+}
+
 static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 				       const struct kvm_memory_slot *slot,
 				       gfn_t gfn, int max_level, bool is_private)
@@ -4450,42 +4486,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 		vcpu->stat.pf_fixed++;
 }
 
-static inline u8 kvm_max_level_for_order(int order)
-{
-	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
-
-	KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
-			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
-			order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
-
-	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
-		return PG_LEVEL_1G;
-
-	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
-		return PG_LEVEL_2M;
-
-	return PG_LEVEL_4K;
-}
-
-static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
-					u8 max_level, int gmem_order)
-{
-	u8 req_max_level;
-
-	if (max_level == PG_LEVEL_4K)
-		return PG_LEVEL_4K;
-
-	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
-	if (max_level == PG_LEVEL_4K)
-		return PG_LEVEL_4K;
-
-	req_max_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn);
-	if (req_max_level)
-		max_level = min(max_level, req_max_level);
-
-	return max_level;
-}
-
 static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 				      struct kvm_page_fault *fault, int r)
 {
-- 
2.50.0.727.gbf7dc18ff4-goog


[-- Attachment #4: 0012-KVM-x86-mmu-Enforce-guest_memfd-s-max-order-when-rec.patch --]
[-- Type: text/x-diff, Size: 7184 bytes --]

From 8855789c66546df41744b500caa3207a67d5fbbc Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 21 Jul 2025 13:44:21 -0700
Subject: [PATCH 12/23] KVM: x86/mmu: Enforce guest_memfd's max order when
 recovering hugepages

Rework kvm_mmu_max_mapping_level() to consult guest_memfd (and relevant)
vendor code when recovering hugepages, e.g. after disabling live migration.
The flaw has existed since guest_memfd was originally added, but has gone
unnoticed due to lack of guest_memfd hupepage support.

Get all information on-demand from the memslot and guest_memfd instance,
even though KVM could pull the pfn from the SPTE.  The max order/level
needs to come from guest_memfd, and using kvm_gmem_get_pfn() avoids adding
a new gmem API, and avoids having to retrieve the pfn and plumb it into
kvm_mmu_max_mapping_level() (the pfn is needed for SNP to consult the RMP).

Note, calling kvm_mem_is_private() in the non-fault path is safe, so long
as mmu_lock is held, as hugepage recovery operates on shadow-present SPTEs,
i.e. calling kvm_mmu_max_mapping_level() with @fault=NULL is mutually
exclusive with kvm_vm_set_mem_attributes() changing the PRIVATE attribute
of the gfn.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 87 +++++++++++++++++++--------------
 arch/x86/kvm/mmu/mmu_internal.h |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |  2 +-
 3 files changed, 52 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9a0c9b9473d9..1ff7582d5fae 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3275,31 +3275,55 @@ static u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
-static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
-					u8 max_level, int gmem_order)
+static u8 kvm_max_private_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
+					const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	u8 req_max_level;
+	struct page *page;
+	kvm_pfn_t pfn;
+	u8 max_level;
 
-	if (max_level == PG_LEVEL_4K)
-		return PG_LEVEL_4K;
+	/* For faults, use the gmem information that was resolved earlier. */
+	if (fault) {
+		pfn = fault->pfn;
+		max_level = fault->max_level;
+	} else {
+		/* TODO: Constify the guest_memfd chain. */
+		struct kvm_memory_slot *__slot = (struct kvm_memory_slot *)slot;
+		int max_order, r;
 
-	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
-	if (max_level == PG_LEVEL_4K)
-		return PG_LEVEL_4K;
+		r = kvm_gmem_get_pfn(kvm, __slot, gfn, &pfn, &page, &max_order);
+		if (r)
+			return PG_LEVEL_4K;
+
+		if (page)
+			put_page(page);
 
-	req_max_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn);
-	if (req_max_level)
-		max_level = min(max_level, req_max_level);
+		max_level = kvm_max_level_for_order(max_order);
+	}
+
+	if (max_level == PG_LEVEL_4K)
+		return max_level;
 
-	return max_level;
+	return min(max_level,
+		   kvm_x86_call(gmem_max_mapping_level)(kvm, pfn));
 }
 
-static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
-				       const struct kvm_memory_slot *slot,
-				       gfn_t gfn, int max_level, bool is_private)
+int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
+			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	struct kvm_lpage_info *linfo;
-	int host_level;
+	int host_level, max_level;
+	bool is_private;
+
+	lockdep_assert_held(&kvm->mmu_lock);
+
+	if (fault) {
+		max_level = fault->max_level;
+		is_private = fault->is_private;
+	} else {
+		max_level = PG_LEVEL_NUM;
+		is_private = kvm_mem_is_private(kvm, gfn);
+	}
 
 	max_level = min(max_level, max_huge_page_level);
 	for ( ; max_level > PG_LEVEL_4K; max_level--) {
@@ -3308,25 +3332,16 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 			break;
 	}
 
+	if (max_level == PG_LEVEL_4K)
+		return PG_LEVEL_4K;
+
 	if (is_private)
-		return max_level;
-
-	if (max_level == PG_LEVEL_4K)
-		return PG_LEVEL_4K;
-
-	host_level = host_pfn_mapping_level(kvm, gfn, slot);
+		host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn);
+	else
+		host_level = host_pfn_mapping_level(kvm, gfn, slot);
 	return min(host_level, max_level);
 }
 
-int kvm_mmu_max_mapping_level(struct kvm *kvm,
-			      const struct kvm_memory_slot *slot, gfn_t gfn)
-{
-	bool is_private = kvm_slot_has_gmem(slot) &&
-			  kvm_mem_is_private(kvm, gfn);
-
-	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
-}
-
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_memory_slot *slot = fault->slot;
@@ -3347,9 +3362,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * Enforce the iTLB multihit workaround after capturing the requested
 	 * level, which will be used to do precise, accurate accounting.
 	 */
-	fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot,
-						       fault->gfn, fault->max_level,
-						       fault->is_private);
+	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, fault,
+						     fault->slot, fault->gfn);
 	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
 		return;
 
@@ -4511,8 +4525,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	}
 
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
-	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
-							 fault->max_level, max_order);
+	fault->max_level = kvm_max_level_for_order(max_order);
 
 	return RET_PF_CONTINUE;
 }
@@ -7102,7 +7115,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		 * mapping if the indirect sp has level = 1.
 		 */
 		if (sp->role.direct &&
-		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn)) {
+		    sp->role.level < kvm_mmu_max_mapping_level(kvm, NULL, slot, sp->gfn)) {
 			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
 
 			if (kvm_available_flush_remote_tlbs_range())
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index db8f33e4de62..21240e4f1b0d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -408,7 +408,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	return r;
 }
 
-int kvm_mmu_max_mapping_level(struct kvm *kvm,
+int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
 			      const struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7f3d7229b2c1..740cb06accdb 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1813,7 +1813,7 @@ static void recover_huge_pages_range(struct kvm *kvm,
 		if (iter.gfn < start || iter.gfn >= end)
 			continue;
 
-		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot, iter.gfn);
+		max_mapping_level = kvm_mmu_max_mapping_level(kvm, NULL, slot, iter.gfn);
 		if (max_mapping_level < iter.level)
 			continue;
 
-- 
2.50.0.727.gbf7dc18ff4-goog


[-- Attachment #5: 0013-KVM-x86-mmu-Extend-guest_memfd-s-max-mapping-level-t.patch --]
[-- Type: text/x-diff, Size: 7006 bytes --]

From 12a1dc374259e82efd19b930bfaf50ecb5ba9800 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 21 Jul 2025 14:56:50 -0700
Subject: [PATCH 13/23] KVM: x86/mmu: Extend guest_memfd's max mapping level to
 shared mappings

Rework kvm_mmu_max_mapping_level() to consult guest_memfd for all mappings,
not just private mappings, so that hugepage support plays nice with the
upcoming support for backing non-private memory with guest_memfd.

In addition to getting the max order from guest_memfd for gmem-only
memslots, update TDX's hook to effectively ignore shared mappings, as TDX's
restrictions on page size only apply to Secure EPT mappings.  Do nothing
for SNP, as RMP restrictions apply to both private and shared memory.

Suggested-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu/mmu.c          | 12 +++++++-----
 arch/x86/kvm/svm/sev.c          |  2 +-
 arch/x86/kvm/svm/svm.h          |  4 ++--
 arch/x86/kvm/vmx/main.c         |  7 ++++---
 arch/x86/kvm/vmx/tdx.c          |  5 ++++-
 arch/x86/kvm/vmx/x86_ops.h      |  2 +-
 7 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1569520e84d2..ae36973f48a6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1907,7 +1907,7 @@ struct kvm_x86_ops {
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
-	int (*gmem_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn);
+	int (*gmem_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1ff7582d5fae..2d1894ed1623 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3275,8 +3275,9 @@ static u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
-static u8 kvm_max_private_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
-					const struct kvm_memory_slot *slot, gfn_t gfn)
+static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
+				     const struct kvm_memory_slot *slot, gfn_t gfn,
+				     bool is_private)
 {
 	struct page *page;
 	kvm_pfn_t pfn;
@@ -3305,7 +3306,7 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, struct kvm_page_fault *
 		return max_level;
 
 	return min(max_level,
-		   kvm_x86_call(gmem_max_mapping_level)(kvm, pfn));
+		   kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private));
 }
 
 int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
@@ -3335,8 +3336,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
-	if (is_private)
-		host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn);
+	if (is_private || kvm_memslot_is_gmem_only(slot))
+		host_level = kvm_gmem_max_mapping_level(kvm, fault, slot, gfn,
+							is_private);
 	else
 		host_level = host_pfn_mapping_level(kvm, gfn, slot);
 	return min(host_level, max_level);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 81974ae2bc8c..c28cf72aa7aa 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4906,7 +4906,7 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
 	}
 }
 
-int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private)
 {
 	int level, rc;
 	bool assigned;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bd7445e0b521..118266bfa46b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -787,7 +787,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
-int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
+int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
 struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
 void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa);
 #else
@@ -816,7 +816,7 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
 	return 0;
 }
 static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
-static inline int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+static inline int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private)
 {
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 0c5b66edbf49..1deeca587b39 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -871,10 +871,11 @@ static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
 	return tdx_vcpu_ioctl(vcpu, argp);
 }
 
-static int vt_gmem_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+static int vt_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
+				     bool is_private)
 {
 	if (is_td(kvm))
-		return tdx_gmem_gmem_max_mapping_level(kvm, pfn);
+		return tdx_gmem_max_mapping_level(kvm, pfn, is_private);
 
 	return 0;
 }
@@ -1044,7 +1045,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.mem_enc_ioctl = vt_op_tdx_only(mem_enc_ioctl),
 	.vcpu_mem_enc_ioctl = vt_op_tdx_only(vcpu_mem_enc_ioctl),
 
-	.gmem_max_mapping_level = vt_op_tdx_only(gmem_gmem_max_mapping_level)
+	.gmem_max_mapping_level = vt_op_tdx_only(gmem_max_mapping_level)
 };
 
 struct kvm_x86_init_ops vt_init_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index d867a210eba0..4a1f2c4bdb66 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3322,8 +3322,11 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
 	return ret;
 }
 
-int tdx_gmem_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+int tdx_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private)
 {
+	if (!is_private)
+		return 0;
+
 	return PG_LEVEL_4K;
 }
 
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 26c6de3d775c..520d12c304d3 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -163,7 +163,7 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
 void tdx_flush_tlb_current(struct kvm_vcpu *vcpu);
 void tdx_flush_tlb_all(struct kvm_vcpu *vcpu);
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level);
-int tdx_gmem_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
+int tdx_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
 #endif
 
 #endif /* __KVM_X86_VMX_X86_OPS_H */
-- 
2.50.0.727.gbf7dc18ff4-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 22:21               ` Sean Christopherson
@ 2025-07-21 23:50                 ` Vishal Annapurve
  2025-07-22 14:35                   ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Vishal Annapurve @ 2025-07-21 23:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Mon, Jul 21, 2025 at 3:21 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Jul 21, 2025, Vishal Annapurve wrote:
> > On Mon, Jul 21, 2025 at 10:29 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > >
> > > > > > 2) KVM fetches shared faults through userspace page tables and not
> > > > > > guest_memfd directly.
> > > > >
> > > > > This is also irrelevant.  KVM _already_ supports resolving shared faults through
> > > > > userspace page tables.  That support won't go away as KVM will always need/want
> > > > > to support mapping VM_IO and/or VM_PFNMAP memory into the guest (even for TDX).
> >
> > As a combination of [1] and [2], I believe we are saying that for
> > memslots backed by mappable guest_memfd files, KVM will always serve
> > both shared/private faults using kvm_gmem_get_pfn().
>
> No, KVM can't guarantee that with taking and holding mmap_lock across hva_to_pfn(),
> and as I mentioned earlier in the thread, that's a non-starter for me.

I think what you mean is that if KVM wants to enforce the behavior
that VMAs passed by the userspace are backed by the same guest_memfd
file as passed in the memslot then KVM will need to hold mmap_lock
across hva_to_pfn() to verify that.

What I understand from the implementation of [1] & [2], all guest
faults on a memslot backed by mappable guest_memfd will pass the
fault_from_gmem() check and so will be routed to guest_memfd i.e.
hva_to_pfn path is skipped for any guest faults. If userspace passes
in VMA mapped by a different guest_memfd file then the guest and
userspace will have a different view of the memory for shared faults.

[1] https://lore.kernel.org/kvm/20250717162731.446579-10-tabba@google.com/
[2] https://lore.kernel.org/kvm/20250717162731.446579-14-tabba@google.com/

>
> For a memslot without a valid slot->gmem.file, slot->userspace_addr will be used
> to resolve faults and access guest memory.  By design, KVM has no knowledge of
> what lies behind userspace_addr (arm64 and other architectures peek at the VMA,
> but x86 does not).  So we can't say that mmap()'d guest_memfd instance will *only*
> go through kvm_gmem_get_pfn().
>
>


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level
  2025-07-21 23:17     ` Sean Christopherson
@ 2025-07-22  5:35       ` Xiaoyao Li
  2025-07-22 11:08         ` Fuad Tabba
  2025-07-22 10:35       ` Fuad Tabba
  1 sibling, 1 reply; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-22  5:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 7/22/2025 7:17 AM, Sean Christopherson wrote:
> On Fri, Jul 18, 2025, Xiaoyao Li wrote:
>> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
>>> From: Ackerley Tng <ackerleytng@google.com>
>>>
>>> Refactor kvm_max_private_mapping_level() to accept a NULL kvm_page_fault
>>> pointer and rename it to kvm_gmem_max_mapping_level().
>>>
>>> The max_mapping_level x86 operation (previously private_max_mapping_level)
>>> is designed to potentially be called without an active page fault, for
>>> instance, when kvm_mmu_max_mapping_level() is determining the maximum
>>> mapping level for a gfn proactively.
>>>
>>> Allow NULL fault pointer: Modify kvm_max_private_mapping_level() to
>>> safely handle a NULL fault argument. This aligns its interface with the
>>> kvm_x86_ops.max_mapping_level operation it wraps, which can also be
>>> called with NULL.
>>
>> are you sure of it?
>>
>> The patch 09 just added the check of fault->is_private for TDX and SEV.
> 
> +1, this isn't quite right.  That's largely my fault (no pun intended) though, as
> I suggested the basic gist of the NULL @fault handling, and it's a mess.  More at
> the bottom.
> 
>>> Rename function to kvm_gmem_max_mapping_level(): This reinforces that
>>> the function's scope is for guest_memfd-backed memory, which can be
>>> either private or non-private, removing any remaining "private"
>>> connotation from its name.
>>>
>>> Optimize max_level checks: Introduce a check in the caller to skip
>>> querying for max_mapping_level if the current max_level is already
>>> PG_LEVEL_4K, as no further reduction is possible.
>>>
>>> Acked-by: David Hildenbrand <david@redhat.com>
>>> Suggested-by: Sean Christoperson <seanjc@google.com>
>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>>    arch/x86/kvm/mmu/mmu.c | 16 +++++++---------
>>>    1 file changed, 7 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>> index bb925994cbc5..6bd28fda0fd3 100644
>>> --- a/arch/x86/kvm/mmu/mmu.c
>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>> @@ -4467,17 +4467,13 @@ static inline u8 kvm_max_level_for_order(int order)
>>>    	return PG_LEVEL_4K;
>>>    }
>>> -static u8 kvm_max_private_mapping_level(struct kvm *kvm,
>>> -					struct kvm_page_fault *fault,
>>> -					int gmem_order)
>>> +static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
>>> +				     struct kvm_page_fault *fault)
>>>    {
>>> -	u8 max_level = fault->max_level;
>>>    	u8 req_max_level;
>>> +	u8 max_level;
>>> -	if (max_level == PG_LEVEL_4K)
>>> -		return PG_LEVEL_4K;
>>> -
>>> -	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
>>> +	max_level = kvm_max_level_for_order(order);
>>>    	if (max_level == PG_LEVEL_4K)
>>>    		return PG_LEVEL_4K;
>>> @@ -4513,7 +4509,9 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
>>>    	}
>>>    	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
>>> -	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault, max_order);
>>> +	if (fault->max_level >= PG_LEVEL_4K)
>>> +		fault->max_level = kvm_gmem_max_mapping_level(vcpu->kvm,
>>> +							      max_order, fault);
>>
>> I cannot understand why this change is required. In what case will
>> fault->max_level < PG_LEVEL_4K?
> 
> Yeah, I don't get this code either.  I also don't think KVM should call
> kvm_gmem_max_mapping_level() *here*.  That's mostly a problem with my suggested
> NULL @fault handling.  Dealing with kvm_gmem_max_mapping_level() here leads to
> weirdness, because kvm_gmem_max_mapping_level() also needs to be invoked for the
> !fault path, and then we end up with multiple call sites and the potential for a
> redundant call (gmem only, is private).
> 
> Looking through surrounding patches, the ordering of things is also "off".
> "Generalize private_max_mapping_level x86 op to max_mapping_level" should just
> rename the helper; reacting to !is_private memory in TDX belongs in "Consult
> guest_memfd when computing max_mapping_level", because that's where KVM plays
> nice with non-private memory.
> 
> But that patch is also doing too much, e.g. shuffling code around and short-circuting
> the non-fault case, which makes it confusing and hard to review.  Extending gmem
> hugepage support to shared memory should be "just" this:
> 
> @@ -3335,8 +3336,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
>          if (max_level == PG_LEVEL_4K)
>                  return PG_LEVEL_4K;
>   
> -       if (is_private)
> -               host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn);
> +       if (is_private || kvm_memslot_is_gmem_only(slot))
> +               host_level = kvm_gmem_max_mapping_level(kvm, fault, slot, gfn,
> +                                                       is_private);
>          else
>                  host_level = host_pfn_mapping_level(kvm, gfn, slot);
>          return min(host_level, max_level);
> 
> plus the plumbing and the small TDX change.  All the renames and code shuffling
> should be done in prep patches.
> 
> The attached patches are compile-tested only, but I think they get use where we
> want to be, and without my confusing suggestion to try and punt on private mappings
> in the hugepage recovery paths.  They should slot it at the right patch numbers
> (relative to v15).
> 
> Holler if the patches don't work, I'm happy to help sort things out so that v16
> is ready to go.

I have some feedback though the attached patches function well.

- In 0010-KVM-x86-mmu-Rename-.private_max_mapping_level-to-.gm.patch, 
there is double gmem in the name of vmx/vt 's callback implementation:

     vt_gmem_gmem_max_mapping_level
     tdx_gmem_gmem_max_mapping_level
     vt_op_tdx_only(gmem_gmem_max_mapping_level)

- In 0013-KVM-x86-mmu-Extend-guest_memfd-s-max-mapping-level-t.patch,
   kvm_x86_call(gmem_max_mapping_level)(...) returns 0 for !private case.
   It's not correct though it works without issue currently.

   Because current gmem doesn't support hugepage so that the max_level
   gotten from gmem is always PG_LEVEL_4K and it returns early in
   kvm_gmem_max_mapping_level() on

	if (max_level == PG_LEVEL_4K)
		return max_level;

   But just look at the following case:

     return min(max_level,
	kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private));

   For non-TDX case and non-SNP case, it will return 0, i.e.
   PG_LEVEL_NONE eventually.

   so either 1) return PG_LEVEL_NUM/PG_LEVEL_1G for the cases where
   .gmem_max_mapping_level callback doesn't have specific restriction.

   or 2)

	tmp = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private);
	if (tmp)
		return min(max_level, tmp);

	return max-level;



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-07-21 16:47   ` Sean Christopherson
  2025-07-21 16:56     ` Fuad Tabba
@ 2025-07-22  5:41     ` Xiaoyao Li
  2025-07-22  8:43       ` Fuad Tabba
  1 sibling, 1 reply; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-22  5:41 UTC (permalink / raw)
  To: Sean Christopherson, Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata,
	mic, vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 7/22/2025 12:47 AM, Sean Christopherson wrote:
> On Thu, Jul 17, 2025, Fuad Tabba wrote:
>> From: Ackerley Tng <ackerleytng@google.com>
>>
>> Update the KVM MMU fault handler to service guest page faults
>> for memory slots backed by guest_memfd with mmap support. For such
>> slots, the MMU must always fault in pages directly from guest_memfd,
>> bypassing the host's userspace_addr.
>>
>> This ensures that guest_memfd-backed memory is always handled through
>> the guest_memfd specific faulting path, regardless of whether it's for
>> private or non-private (shared) use cases.
>>
>> Additionally, rename kvm_mmu_faultin_pfn_private() to
>> kvm_mmu_faultin_pfn_gmem(), as this function is now used to fault in
>> pages from guest_memfd for both private and non-private memory,
>> accommodating the new use cases.
>>
>> Co-developed-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>> Co-developed-by: Fuad Tabba <tabba@google.com>
>> Signed-off-by: Fuad Tabba <tabba@google.com>
>> ---
>>   arch/x86/kvm/mmu/mmu.c | 13 +++++++++----
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 94be15cde6da..ad5f337b496c 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -4511,8 +4511,8 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
>>   				 r == RET_PF_RETRY, fault->map_writable);
>>   }
>>   
>> -static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
>> -				       struct kvm_page_fault *fault)
>> +static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
>> +				    struct kvm_page_fault *fault)
>>   {
>>   	int max_order, r;
>>   
>> @@ -4536,13 +4536,18 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
>>   	return RET_PF_CONTINUE;
>>   }
>>   
>> +static bool fault_from_gmem(struct kvm_page_fault *fault)
> 
> Drop the helper.  It has exactly one caller, and it makes the code *harder* to
> read, e.g. raises the question of what "from gmem" even means.  If a separate
> series follows and needs/justifies this helper, then it can/should be added then.

there is another place requires the same check introduced by your
"KVM: x86/mmu: Extend guest_memfd's max mapping level to shared 
mappings" provided in [*]

[*] https://lore.kernel.org/kvm/aH7KghhsjaiIL3En@google.com/

---
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1ff7582d5fae..2d1894ed1623 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c

@@ -3335,8 +3336,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, 
struct kvm_page_fault *fault,
         if (max_level == PG_LEVEL_4K)
                 return PG_LEVEL_4K;

-       if (is_private)
-               host_level = kvm_max_private_mapping_level(kvm, fault, 
slot, gfn);
+       if (is_private || kvm_memslot_is_gmem_only(slot))
+               host_level = kvm_gmem_max_mapping_level(kvm, fault, 
slot, gfn,
+                                                       is_private);
         else
                 host_level = host_pfn_mapping_level(kvm, gfn, slot);
         return min(host_level, max_level);

>> +{
>> +	return fault->is_private || kvm_memslot_is_gmem_only(fault->slot);
>> +}
>> +
>>   static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>>   				 struct kvm_page_fault *fault)
>>   {
>>   	unsigned int foll = fault->write ? FOLL_WRITE : 0;
>>   
>> -	if (fault->is_private)
>> -		return kvm_mmu_faultin_pfn_private(vcpu, fault);
>> +	if (fault_from_gmem(fault))
>> +		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>>   
>>   	foll |= FOLL_NOWAIT;
>>   	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
>> -- 
>> 2.50.0.727.gbf7dc18ff4-goog
>>



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
  2025-07-22  5:41     ` Xiaoyao Li
@ 2025-07-22  8:43       ` Fuad Tabba
  0 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-22  8:43 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On Tue, 22 Jul 2025 at 06:41, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 7/22/2025 12:47 AM, Sean Christopherson wrote:
> > On Thu, Jul 17, 2025, Fuad Tabba wrote:
> >> From: Ackerley Tng <ackerleytng@google.com>
> >>
> >> Update the KVM MMU fault handler to service guest page faults
> >> for memory slots backed by guest_memfd with mmap support. For such
> >> slots, the MMU must always fault in pages directly from guest_memfd,
> >> bypassing the host's userspace_addr.
> >>
> >> This ensures that guest_memfd-backed memory is always handled through
> >> the guest_memfd specific faulting path, regardless of whether it's for
> >> private or non-private (shared) use cases.
> >>
> >> Additionally, rename kvm_mmu_faultin_pfn_private() to
> >> kvm_mmu_faultin_pfn_gmem(), as this function is now used to fault in
> >> pages from guest_memfd for both private and non-private memory,
> >> accommodating the new use cases.
> >>
> >> Co-developed-by: David Hildenbrand <david@redhat.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> >> Co-developed-by: Fuad Tabba <tabba@google.com>
> >> Signed-off-by: Fuad Tabba <tabba@google.com>
> >> ---
> >>   arch/x86/kvm/mmu/mmu.c | 13 +++++++++----
> >>   1 file changed, 9 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> >> index 94be15cde6da..ad5f337b496c 100644
> >> --- a/arch/x86/kvm/mmu/mmu.c
> >> +++ b/arch/x86/kvm/mmu/mmu.c
> >> @@ -4511,8 +4511,8 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
> >>                               r == RET_PF_RETRY, fault->map_writable);
> >>   }
> >>
> >> -static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> >> -                                   struct kvm_page_fault *fault)
> >> +static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> >> +                                struct kvm_page_fault *fault)
> >>   {
> >>      int max_order, r;
> >>
> >> @@ -4536,13 +4536,18 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> >>      return RET_PF_CONTINUE;
> >>   }
> >>
> >> +static bool fault_from_gmem(struct kvm_page_fault *fault)
> >
> > Drop the helper.  It has exactly one caller, and it makes the code *harder* to
> > read, e.g. raises the question of what "from gmem" even means.  If a separate
> > series follows and needs/justifies this helper, then it can/should be added then.
>
> there is another place requires the same check introduced by your
> "KVM: x86/mmu: Extend guest_memfd's max mapping level to shared
> mappings" provided in [*]
>
> [*] https://lore.kernel.org/kvm/aH7KghhsjaiIL3En@google.com/

I guess this is meant for the patch [*], which as far as I could tell,
isn't in the latest (-rc7) yet.

Thanks,
/fuad

> ---
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 1ff7582d5fae..2d1894ed1623 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
>
> @@ -3335,8 +3336,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
> struct kvm_page_fault *fault,
>          if (max_level == PG_LEVEL_4K)
>                  return PG_LEVEL_4K;
>
> -       if (is_private)
> -               host_level = kvm_max_private_mapping_level(kvm, fault,
> slot, gfn);
> +       if (is_private || kvm_memslot_is_gmem_only(slot))
> +               host_level = kvm_gmem_max_mapping_level(kvm, fault,
> slot, gfn,
> +                                                       is_private);
>          else
>                  host_level = host_pfn_mapping_level(kvm, gfn, slot);
>          return min(host_level, max_level);
>
> >> +{
> >> +    return fault->is_private || kvm_memslot_is_gmem_only(fault->slot);
> >> +}
> >> +
> >>   static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> >>                               struct kvm_page_fault *fault)
> >>   {
> >>      unsigned int foll = fault->write ? FOLL_WRITE : 0;
> >>
> >> -    if (fault->is_private)
> >> -            return kvm_mmu_faultin_pfn_private(vcpu, fault);
> >> +    if (fault_from_gmem(fault))
> >> +            return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
> >>
> >>      foll |= FOLL_NOWAIT;
> >>      fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
> >> --
> >> 2.50.0.727.gbf7dc18ff4-goog
> >>
>
>


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-07-21 17:33       ` Sean Christopherson
@ 2025-07-22  9:29         ` Fuad Tabba
  2025-07-22 15:58           ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-22  9:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Mon, 21 Jul 2025 at 18:33, Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Jul 21, 2025, Fuad Tabba wrote:
> > > The below diff applies on top.  I'm guessing there may be some intermediate
> > > ugliness (I haven't mapped out exactly where/how to squash this throughout the
> > > series, and there is feedback relevant to future patches), but IMO this is a much
> > > cleaner resting state (see the diff stats).
> >
> > So just so that I am clear, applying the diff below to the appropriate
> > patches would address all the concerns that you have mentioned in this
> > email?
>
> Yes?  It should, I just don't want to pinky swear in case I botched something.

Other than this patch not applying, nah, I think it's all good ;P. I
guess base-commit: 9eba3a9ac9cd5922da7f6e966c01190f909ed640 is
somewhere in a local tree of yours. There are quite a few conflicts
and I don't think it would build even if based on the right tree,
e.g.,  KVM_CAP_GUEST_MEMFD_MMAP is a rename of KVM_CAP_GMEM_MMAP,
rather an addition of an undeclared identifier.

That said, I think I understand what you mean, and I can apply the
spirit of this patch.

Stay tuned for v16.
/fuad

> But goofs aside, yes, if the end result looks like what was the below, I'm happy.
> Again, things might get ugly in the process, i.e. might be temporariliy gross,
> but that's ok (within reason).


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level
  2025-07-21 23:17     ` Sean Christopherson
  2025-07-22  5:35       ` Xiaoyao Li
@ 2025-07-22 10:35       ` Fuad Tabba
  1 sibling, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-22 10:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Tue, 22 Jul 2025 at 00:17, Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Jul 18, 2025, Xiaoyao Li wrote:
> > On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > > From: Ackerley Tng <ackerleytng@google.com>
> > >
> > > Refactor kvm_max_private_mapping_level() to accept a NULL kvm_page_fault
> > > pointer and rename it to kvm_gmem_max_mapping_level().
> > >
> > > The max_mapping_level x86 operation (previously private_max_mapping_level)
> > > is designed to potentially be called without an active page fault, for
> > > instance, when kvm_mmu_max_mapping_level() is determining the maximum
> > > mapping level for a gfn proactively.
> > >
> > > Allow NULL fault pointer: Modify kvm_max_private_mapping_level() to
> > > safely handle a NULL fault argument. This aligns its interface with the
> > > kvm_x86_ops.max_mapping_level operation it wraps, which can also be
> > > called with NULL.
> >
> > are you sure of it?
> >
> > The patch 09 just added the check of fault->is_private for TDX and SEV.
>
> +1, this isn't quite right.  That's largely my fault (no pun intended) though, as
> I suggested the basic gist of the NULL @fault handling, and it's a mess.  More at
> the bottom.
>
> > > Rename function to kvm_gmem_max_mapping_level(): This reinforces that
> > > the function's scope is for guest_memfd-backed memory, which can be
> > > either private or non-private, removing any remaining "private"
> > > connotation from its name.
> > >
> > > Optimize max_level checks: Introduce a check in the caller to skip
> > > querying for max_mapping_level if the current max_level is already
> > > PG_LEVEL_4K, as no further reduction is possible.
> > >
> > > Acked-by: David Hildenbrand <david@redhat.com>
> > > Suggested-by: Sean Christoperson <seanjc@google.com>
> > > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > > Signed-off-by: Fuad Tabba <tabba@google.com>
> > > ---
> > >   arch/x86/kvm/mmu/mmu.c | 16 +++++++---------
> > >   1 file changed, 7 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index bb925994cbc5..6bd28fda0fd3 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -4467,17 +4467,13 @@ static inline u8 kvm_max_level_for_order(int order)
> > >     return PG_LEVEL_4K;
> > >   }
> > > -static u8 kvm_max_private_mapping_level(struct kvm *kvm,
> > > -                                   struct kvm_page_fault *fault,
> > > -                                   int gmem_order)
> > > +static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
> > > +                                struct kvm_page_fault *fault)
> > >   {
> > > -   u8 max_level = fault->max_level;
> > >     u8 req_max_level;
> > > +   u8 max_level;
> > > -   if (max_level == PG_LEVEL_4K)
> > > -           return PG_LEVEL_4K;
> > > -
> > > -   max_level = min(kvm_max_level_for_order(gmem_order), max_level);
> > > +   max_level = kvm_max_level_for_order(order);
> > >     if (max_level == PG_LEVEL_4K)
> > >             return PG_LEVEL_4K;
> > > @@ -4513,7 +4509,9 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > >     }
> > >     fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> > > -   fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault, max_order);
> > > +   if (fault->max_level >= PG_LEVEL_4K)
> > > +           fault->max_level = kvm_gmem_max_mapping_level(vcpu->kvm,
> > > +                                                         max_order, fault);
> >
> > I cannot understand why this change is required. In what case will
> > fault->max_level < PG_LEVEL_4K?
>
> Yeah, I don't get this code either.  I also don't think KVM should call
> kvm_gmem_max_mapping_level() *here*.  That's mostly a problem with my suggested
> NULL @fault handling.  Dealing with kvm_gmem_max_mapping_level() here leads to
> weirdness, because kvm_gmem_max_mapping_level() also needs to be invoked for the
> !fault path, and then we end up with multiple call sites and the potential for a
> redundant call (gmem only, is private).
>
> Looking through surrounding patches, the ordering of things is also "off".
> "Generalize private_max_mapping_level x86 op to max_mapping_level" should just
> rename the helper; reacting to !is_private memory in TDX belongs in "Consult
> guest_memfd when computing max_mapping_level", because that's where KVM plays
> nice with non-private memory.
>
> But that patch is also doing too much, e.g. shuffling code around and short-circuting
> the non-fault case, which makes it confusing and hard to review.  Extending gmem
> hugepage support to shared memory should be "just" this:
>
> @@ -3335,8 +3336,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
>         if (max_level == PG_LEVEL_4K)
>                 return PG_LEVEL_4K;
>
> -       if (is_private)
> -               host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn);
> +       if (is_private || kvm_memslot_is_gmem_only(slot))
> +               host_level = kvm_gmem_max_mapping_level(kvm, fault, slot, gfn,
> +                                                       is_private);
>         else
>                 host_level = host_pfn_mapping_level(kvm, gfn, slot);
>         return min(host_level, max_level);
>
> plus the plumbing and the small TDX change.  All the renames and code shuffling
> should be done in prep patches.
>
> The attached patches are compile-tested only, but I think they get use where we
> want to be, and without my confusing suggestion to try and punt on private mappings
> in the hugepage recovery paths.  They should slot it at the right patch numbers
> (relative to v15).
>
> Holler if the patches don't work, I'm happy to help sort things out so that v16
> is ready to go.

These patches apply, build, and run. I'll incorporate them, test them
a bit more with allmodconf and friends, along with the other patch
that you suggested, and respin v16 soon.

Cheers,
/fuad


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level
  2025-07-22  5:35       ` Xiaoyao Li
@ 2025-07-22 11:08         ` Fuad Tabba
  2025-07-22 14:32           ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-22 11:08 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On Tue, 22 Jul 2025 at 06:36, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 7/22/2025 7:17 AM, Sean Christopherson wrote:
> > On Fri, Jul 18, 2025, Xiaoyao Li wrote:
> >> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> >>> From: Ackerley Tng <ackerleytng@google.com>
> >>>
> >>> Refactor kvm_max_private_mapping_level() to accept a NULL kvm_page_fault
> >>> pointer and rename it to kvm_gmem_max_mapping_level().
> >>>
> >>> The max_mapping_level x86 operation (previously private_max_mapping_level)
> >>> is designed to potentially be called without an active page fault, for
> >>> instance, when kvm_mmu_max_mapping_level() is determining the maximum
> >>> mapping level for a gfn proactively.
> >>>
> >>> Allow NULL fault pointer: Modify kvm_max_private_mapping_level() to
> >>> safely handle a NULL fault argument. This aligns its interface with the
> >>> kvm_x86_ops.max_mapping_level operation it wraps, which can also be
> >>> called with NULL.
> >>
> >> are you sure of it?
> >>
> >> The patch 09 just added the check of fault->is_private for TDX and SEV.
> >
> > +1, this isn't quite right.  That's largely my fault (no pun intended) though, as
> > I suggested the basic gist of the NULL @fault handling, and it's a mess.  More at
> > the bottom.
> >
> >>> Rename function to kvm_gmem_max_mapping_level(): This reinforces that
> >>> the function's scope is for guest_memfd-backed memory, which can be
> >>> either private or non-private, removing any remaining "private"
> >>> connotation from its name.
> >>>
> >>> Optimize max_level checks: Introduce a check in the caller to skip
> >>> querying for max_mapping_level if the current max_level is already
> >>> PG_LEVEL_4K, as no further reduction is possible.
> >>>
> >>> Acked-by: David Hildenbrand <david@redhat.com>
> >>> Suggested-by: Sean Christoperson <seanjc@google.com>
> >>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> >>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>> ---
> >>>    arch/x86/kvm/mmu/mmu.c | 16 +++++++---------
> >>>    1 file changed, 7 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> >>> index bb925994cbc5..6bd28fda0fd3 100644
> >>> --- a/arch/x86/kvm/mmu/mmu.c
> >>> +++ b/arch/x86/kvm/mmu/mmu.c
> >>> @@ -4467,17 +4467,13 @@ static inline u8 kvm_max_level_for_order(int order)
> >>>     return PG_LEVEL_4K;
> >>>    }
> >>> -static u8 kvm_max_private_mapping_level(struct kvm *kvm,
> >>> -                                   struct kvm_page_fault *fault,
> >>> -                                   int gmem_order)
> >>> +static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
> >>> +                                struct kvm_page_fault *fault)
> >>>    {
> >>> -   u8 max_level = fault->max_level;
> >>>     u8 req_max_level;
> >>> +   u8 max_level;
> >>> -   if (max_level == PG_LEVEL_4K)
> >>> -           return PG_LEVEL_4K;
> >>> -
> >>> -   max_level = min(kvm_max_level_for_order(gmem_order), max_level);
> >>> +   max_level = kvm_max_level_for_order(order);
> >>>     if (max_level == PG_LEVEL_4K)
> >>>             return PG_LEVEL_4K;
> >>> @@ -4513,7 +4509,9 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> >>>     }
> >>>     fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> >>> -   fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault, max_order);
> >>> +   if (fault->max_level >= PG_LEVEL_4K)
> >>> +           fault->max_level = kvm_gmem_max_mapping_level(vcpu->kvm,
> >>> +                                                         max_order, fault);
> >>
> >> I cannot understand why this change is required. In what case will
> >> fault->max_level < PG_LEVEL_4K?
> >
> > Yeah, I don't get this code either.  I also don't think KVM should call
> > kvm_gmem_max_mapping_level() *here*.  That's mostly a problem with my suggested
> > NULL @fault handling.  Dealing with kvm_gmem_max_mapping_level() here leads to
> > weirdness, because kvm_gmem_max_mapping_level() also needs to be invoked for the
> > !fault path, and then we end up with multiple call sites and the potential for a
> > redundant call (gmem only, is private).
> >
> > Looking through surrounding patches, the ordering of things is also "off".
> > "Generalize private_max_mapping_level x86 op to max_mapping_level" should just
> > rename the helper; reacting to !is_private memory in TDX belongs in "Consult
> > guest_memfd when computing max_mapping_level", because that's where KVM plays
> > nice with non-private memory.
> >
> > But that patch is also doing too much, e.g. shuffling code around and short-circuting
> > the non-fault case, which makes it confusing and hard to review.  Extending gmem
> > hugepage support to shared memory should be "just" this:
> >
> > @@ -3335,8 +3336,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
> >          if (max_level == PG_LEVEL_4K)
> >                  return PG_LEVEL_4K;
> >
> > -       if (is_private)
> > -               host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn);
> > +       if (is_private || kvm_memslot_is_gmem_only(slot))
> > +               host_level = kvm_gmem_max_mapping_level(kvm, fault, slot, gfn,
> > +                                                       is_private);
> >          else
> >                  host_level = host_pfn_mapping_level(kvm, gfn, slot);
> >          return min(host_level, max_level);
> >
> > plus the plumbing and the small TDX change.  All the renames and code shuffling
> > should be done in prep patches.
> >
> > The attached patches are compile-tested only, but I think they get use where we
> > want to be, and without my confusing suggestion to try and punt on private mappings
> > in the hugepage recovery paths.  They should slot it at the right patch numbers
> > (relative to v15).
> >
> > Holler if the patches don't work, I'm happy to help sort things out so that v16
> > is ready to go.
>
> I have some feedback though the attached patches function well.
>
> - In 0010-KVM-x86-mmu-Rename-.private_max_mapping_level-to-.gm.patch,
> there is double gmem in the name of vmx/vt 's callback implementation:
>
>      vt_gmem_gmem_max_mapping_level
>      tdx_gmem_gmem_max_mapping_level
>      vt_op_tdx_only(gmem_gmem_max_mapping_level)

Sean's patches do that, then he fixes it in a later patch. I'll fix
this at the source.

> - In 0013-KVM-x86-mmu-Extend-guest_memfd-s-max-mapping-level-t.patch,
>    kvm_x86_call(gmem_max_mapping_level)(...) returns 0 for !private case.
>    It's not correct though it works without issue currently.
>
>    Because current gmem doesn't support hugepage so that the max_level
>    gotten from gmem is always PG_LEVEL_4K and it returns early in
>    kvm_gmem_max_mapping_level() on
>
>         if (max_level == PG_LEVEL_4K)
>                 return max_level;
>
>    But just look at the following case:
>
>      return min(max_level,
>         kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private));
>
>    For non-TDX case and non-SNP case, it will return 0, i.e.
>    PG_LEVEL_NONE eventually.
>
>    so either 1) return PG_LEVEL_NUM/PG_LEVEL_1G for the cases where
>    .gmem_max_mapping_level callback doesn't have specific restriction.
>
>    or 2)
>
>         tmp = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private);
>         if (tmp)
>                 return min(max_level, tmp);
>
>         return max-level;

Sean? What do you think?

Thanks!
/fuad


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-07-17 16:27 ` [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
@ 2025-07-22 12:31   ` Kunwu Chan
  2025-07-23  8:20     ` Marc Zyngier
  2025-07-23  8:26   ` Marc Zyngier
  1 sibling, 1 reply; 86+ messages in thread
From: Kunwu Chan @ 2025-07-22 12:31 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	ira.weiny

On 2025/7/18 00:27, Fuad Tabba wrote:
> Add arm64 architecture support for handling guest page faults on memory
> slots backed by guest_memfd.
>
> This change introduces a new function, gmem_abort(), which encapsulates
> the fault handling logic specific to guest_memfd-backed memory. The
> kvm_handle_guest_abort() entry point is updated to dispatch to
> gmem_abort() when a fault occurs on a guest_memfd-backed memory slot (as
> determined by kvm_slot_has_gmem()).
>
> Until guest_memfd gains support for huge pages, the fault granule for
> these memory regions is restricted to PAGE_SIZE.

Since huge pages are not currently supported, would it be more friendly 
to define  sth like

"#define GMEM_PAGE_GRANULE PAGE_SIZE" at the top (rather than hardcoding 
PAGE_SIZE)

  and make it easier to switch to huge page support later?

> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   arch/arm64/kvm/mmu.c | 86 ++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 83 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index b3eacb400fab..8c82df80a835 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1512,6 +1512,82 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
>   	*prot |= kvm_encode_nested_level(nested);
>   }
>   
> +#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
> +
> +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +		      struct kvm_s2_trans *nested,
> +		      struct kvm_memory_slot *memslot, bool is_perm)
> +{
> +	bool write_fault, exec_fault, writable;
> +	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
> +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> +	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> +	unsigned long mmu_seq;
> +	struct page *page;
> +	struct kvm *kvm = vcpu->kvm;
> +	void *memcache;
> +	kvm_pfn_t pfn;
> +	gfn_t gfn;
> +	int ret;
> +
> +	ret = prepare_mmu_memcache(vcpu, true, &memcache);
> +	if (ret)
> +		return ret;
> +
> +	if (nested)
> +		gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
> +	else
> +		gfn = fault_ipa >> PAGE_SHIFT;
> +
> +	write_fault = kvm_is_write_fault(vcpu);
> +	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
> +
> +	VM_WARN_ON_ONCE(write_fault && exec_fault);
> +
> +	mmu_seq = kvm->mmu_invalidate_seq;
> +	/* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */
> +	smp_rmb();
> +
> +	ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
> +	if (ret) {
> +		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
> +					      write_fault, exec_fault, false);
> +		return ret;
> +	}
> +
> +	writable = !(memslot->flags & KVM_MEM_READONLY);
> +
> +	if (nested)
> +		adjust_nested_fault_perms(nested, &prot, &writable);
> +
> +	if (writable)
> +		prot |= KVM_PGTABLE_PROT_W;
> +
> +	if (exec_fault ||
> +	    (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
> +	     (!nested || kvm_s2_trans_executable(nested))))
> +		prot |= KVM_PGTABLE_PROT_X;
> +
> +	kvm_fault_lock(kvm);
> +	if (mmu_invalidate_retry(kvm, mmu_seq)) {
> +		ret = -EAGAIN;
> +		goto out_unlock;
> +	}
> +
> +	ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
> +						 __pfn_to_phys(pfn), prot,
> +						 memcache, flags);
> +
> +out_unlock:
> +	kvm_release_faultin_page(kvm, page, !!ret, writable);
> +	kvm_fault_unlock(kvm);
> +
> +	if (writable && !ret)
> +		mark_page_dirty_in_slot(kvm, memslot, gfn);
> +
> +	return ret != -EAGAIN ? ret : 0;
> +}
> +
>   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			  struct kvm_s2_trans *nested,
>   			  struct kvm_memory_slot *memslot, unsigned long hva,
> @@ -1536,7 +1612,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>   	struct kvm_pgtable *pgt;
>   	struct page *page;
> -	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
> +	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
>   
>   	if (fault_is_perm)
>   		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
> @@ -1961,8 +2037,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>   	VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
>   			!write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
>   
> -	ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> -			     esr_fsc_is_permission_fault(esr));
> +	if (kvm_slot_has_gmem(memslot))
> +		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> +				 esr_fsc_is_permission_fault(esr));
> +	else
> +		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> +				     esr_fsc_is_permission_fault(esr));
>   	if (ret == 0)
>   		ret = 1;
>   out:
LGTM!

-- 
Thanks,
   Kunwu.Chan(Tao.Chan)



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 12:22   ` Xiaoyao Li
  2025-07-21 12:41     ` Fuad Tabba
  2025-07-21 13:45     ` Vishal Annapurve
@ 2025-07-22 14:28     ` Xiaoyao Li
  2025-07-22 14:37       ` Sean Christopherson
  2 siblings, 1 reply; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-22 14:28 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 7/21/2025 8:22 PM, Xiaoyao Li wrote:
> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
>> +/*
>> + * CoCo VMs with hardware support that use guest_memfd only for 
>> backing private
>> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping 
>> enabled.
>> + */
>> +#define kvm_arch_supports_gmem_mmap(kvm)        \
>> +    (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&    \
>> +     (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
> 
> I want to share the findings when I do the POC to enable gmem mmap in QEMU.
> 
> Actually, QEMU can use gmem with mmap support as the normal memory even 
> without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd 
> on KVM_SET_USER_MEMORY_REGION2.
> 
> Since the gmem is mmapable, QEMU can pass the userspace addr got from 
> mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It 
> works well for non-coco VMs on x86.

one more findings.

I tested with QEMU by creating normal (non-private) memory with mmapable 
guest memfd, and enforcily passing the fd of the gmem to struct 
kvm_userspace_memory_region2 when QEMU sets up memory region.

It hits the kvm_gmem_bind() error since QEMU tries to back different GPA 
region with the same gmem.

So, the question is do we want to allow the multi-binding for 
shared-only gmem?

> Then it seems feasible to use gmem with mmap for the shared memory of 
> TDX, and an additional gmem without mmap for the private memory. i.e.,
> For struct kvm_userspace_memory_region, the @userspace_addr is passed 
> with the uaddr returned from gmem0 with mmap, while @guest_memfd is 
> passed with another gmem1 fd without mmap.
> 
> However, it fails actually, because the kvm_arch_suports_gmem_mmap() 
> returns false for TDX VMs, which means userspace cannot allocate gmem 
> with mmap just for shared memory for TDX.
> 
> SO my question is do we want to support such case?
> 



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level
  2025-07-22 11:08         ` Fuad Tabba
@ 2025-07-22 14:32           ` Sean Christopherson
  2025-07-22 15:30             ` Fuad Tabba
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-22 14:32 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Xiaoyao Li, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Tue, Jul 22, 2025, Fuad Tabba wrote:
> On Tue, 22 Jul 2025 at 06:36, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> > - In 0010-KVM-x86-mmu-Rename-.private_max_mapping_level-to-.gm.patch,
> > there is double gmem in the name of vmx/vt 's callback implementation:
> >
> >      vt_gmem_gmem_max_mapping_level
> >      tdx_gmem_gmem_max_mapping_level
> >      vt_op_tdx_only(gmem_gmem_max_mapping_level)
> 
> Sean's patches do that, then he fixes it in a later patch. I'll fix
> this at the source.

Dagnabbit.  I goofed a search+replace, caught it when re-reading things, and
fixed-up the wrong commit.  Sorry :-(

> > - In 0013-KVM-x86-mmu-Extend-guest_memfd-s-max-mapping-level-t.patch,
> >    kvm_x86_call(gmem_max_mapping_level)(...) returns 0 for !private case.
> >    It's not correct though it works without issue currently.
> >
> >    Because current gmem doesn't support hugepage so that the max_level
> >    gotten from gmem is always PG_LEVEL_4K and it returns early in
> >    kvm_gmem_max_mapping_level() on
> >
> >         if (max_level == PG_LEVEL_4K)
> >                 return max_level;
> >
> >    But just look at the following case:
> >
> >      return min(max_level,
> >         kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private));
> >
> >    For non-TDX case and non-SNP case, it will return 0, i.e.
> >    PG_LEVEL_NONE eventually.
> >
> >    so either 1) return PG_LEVEL_NUM/PG_LEVEL_1G for the cases where
> >    .gmem_max_mapping_level callback doesn't have specific restriction.
> >
> >    or 2)
> >
> >         tmp = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private);
> >         if (tmp)
> >                 return min(max_level, tmp);
> >
> >         return max-level;
> 
> Sean? What do you think?

#2, because KVM uses a "ret0" static call when TDX is disabled (and KVM should
do the same when SEV is disabled, but the SEV #ifdefs are still a bit messy).
Switching to any other value would require adding a VMX stubs for the !TDX case.

I think it makes sense to explicitly call that out as the "CoCo level", to help
unfamiliar readers understand why vendor code has any say in the max
mapping level.

And I would say we adjust max_level instead of having an early return, e.g. to
reduce the probability of future bugs due to adding code between the call to
.gmem_max_mapping_level() and the final return.

This as fixup? 

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index eead5dca6f72..a51013e0992a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3279,9 +3279,9 @@ static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fau
                                     const struct kvm_memory_slot *slot, gfn_t gfn,
                                     bool is_private)
 {
+       u8 max_level, coco_level;
        struct page *page;
        kvm_pfn_t pfn;
-       u8 max_level;
 
        /* For faults, use the gmem information that was resolved earlier. */
        if (fault) {
@@ -3305,8 +3305,16 @@ static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fau
        if (max_level == PG_LEVEL_4K)
                return max_level;
 
-       return min(max_level,
-                  kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private));
+       /*
+        * CoCo may influence the max mapping level, e.g. due to RMP or S-EPT
+        * restrictions.  A return of '0' means "no additional restrictions",
+        * to allow for using an optional "ret0" static call.
+        */
+       coco_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private);
+       if (coco_level)
+               max_level = min(max_level, coco_level);
+
+       return max_level;
 }
 
 int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-21 23:50                 ` Vishal Annapurve
@ 2025-07-22 14:35                   ` Sean Christopherson
  2025-07-23 14:08                     ` Vishal Annapurve
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-22 14:35 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: Xiaoyao Li, Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Mon, Jul 21, 2025, Vishal Annapurve wrote:
> On Mon, Jul 21, 2025 at 3:21 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, Jul 21, 2025, Vishal Annapurve wrote:
> > > On Mon, Jul 21, 2025 at 10:29 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > >
> > > > > > > 2) KVM fetches shared faults through userspace page tables and not
> > > > > > > guest_memfd directly.
> > > > > >
> > > > > > This is also irrelevant.  KVM _already_ supports resolving shared faults through
> > > > > > userspace page tables.  That support won't go away as KVM will always need/want
> > > > > > to support mapping VM_IO and/or VM_PFNMAP memory into the guest (even for TDX).
> > >
> > > As a combination of [1] and [2], I believe we are saying that for
> > > memslots backed by mappable guest_memfd files, KVM will always serve
> > > both shared/private faults using kvm_gmem_get_pfn().
> >
> > No, KVM can't guarantee that with taking and holding mmap_lock across hva_to_pfn(),
> > and as I mentioned earlier in the thread, that's a non-starter for me.
> 
> I think what you mean is that if KVM wants to enforce the behavior
> that VMAs passed by the userspace are backed by the same guest_memfd
> file as passed in the memslot then KVM will need to hold mmap_lock
> across hva_to_pfn() to verify that.

No, I'm talking about the case where userspace creates a memslot *without*
KVM_MEM_GUEST_MEMFD, but with userspace_addr pointing at a mmap()'d guest_memfd
instance.  That is the scenario Xiaoyao brought up:

 : Actually, QEMU can use gmem with mmap support as the normal memory even
 : without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
 : on KVM_SET_USER_MEMORY_REGION2.
 :
 : ...
 : 
 : However, it fails actually, because the kvm_arch_suports_gmem_mmap()
 : returns false for TDX VMs, which means userspace cannot allocate gmem
 : with mmap just for shared memory for TDX.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-22 14:28     ` Xiaoyao Li
@ 2025-07-22 14:37       ` Sean Christopherson
  2025-07-22 15:31         ` Xiaoyao Li
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-22 14:37 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Tue, Jul 22, 2025, Xiaoyao Li wrote:
> On 7/21/2025 8:22 PM, Xiaoyao Li wrote:
> > On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > > +/*
> > > + * CoCo VMs with hardware support that use guest_memfd only for
> > > backing private
> > > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping
> > > enabled.
> > > + */
> > > +#define kvm_arch_supports_gmem_mmap(kvm)        \
> > > +    (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&    \
> > > +     (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
> > 
> > I want to share the findings when I do the POC to enable gmem mmap in QEMU.
> > 
> > Actually, QEMU can use gmem with mmap support as the normal memory even
> > without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
> > on KVM_SET_USER_MEMORY_REGION2.
> > 
> > Since the gmem is mmapable, QEMU can pass the userspace addr got from
> > mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
> > works well for non-coco VMs on x86.
> 
> one more findings.
> 
> I tested with QEMU by creating normal (non-private) memory with mmapable
> guest memfd, and enforcily passing the fd of the gmem to struct
> kvm_userspace_memory_region2 when QEMU sets up memory region.
> 
> It hits the kvm_gmem_bind() error since QEMU tries to back different GPA
> region with the same gmem.
> 
> So, the question is do we want to allow the multi-binding for shared-only
> gmem?

Can you elaborate, maybe with code?  I don't think I fully understand the setup.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level
  2025-07-22 14:32           ` Sean Christopherson
@ 2025-07-22 15:30             ` Fuad Tabba
  0 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-22 15:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Tue, 22 Jul 2025 at 15:32, Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Jul 22, 2025, Fuad Tabba wrote:
> > On Tue, 22 Jul 2025 at 06:36, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> > > - In 0010-KVM-x86-mmu-Rename-.private_max_mapping_level-to-.gm.patch,
> > > there is double gmem in the name of vmx/vt 's callback implementation:
> > >
> > >      vt_gmem_gmem_max_mapping_level
> > >      tdx_gmem_gmem_max_mapping_level
> > >      vt_op_tdx_only(gmem_gmem_max_mapping_level)
> >
> > Sean's patches do that, then he fixes it in a later patch. I'll fix
> > this at the source.
>
> Dagnabbit.  I goofed a search+replace, caught it when re-reading things, and
> fixed-up the wrong commit.  Sorry :-(
>
> > > - In 0013-KVM-x86-mmu-Extend-guest_memfd-s-max-mapping-level-t.patch,
> > >    kvm_x86_call(gmem_max_mapping_level)(...) returns 0 for !private case.
> > >    It's not correct though it works without issue currently.
> > >
> > >    Because current gmem doesn't support hugepage so that the max_level
> > >    gotten from gmem is always PG_LEVEL_4K and it returns early in
> > >    kvm_gmem_max_mapping_level() on
> > >
> > >         if (max_level == PG_LEVEL_4K)
> > >                 return max_level;
> > >
> > >    But just look at the following case:
> > >
> > >      return min(max_level,
> > >         kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private));
> > >
> > >    For non-TDX case and non-SNP case, it will return 0, i.e.
> > >    PG_LEVEL_NONE eventually.
> > >
> > >    so either 1) return PG_LEVEL_NUM/PG_LEVEL_1G for the cases where
> > >    .gmem_max_mapping_level callback doesn't have specific restriction.
> > >
> > >    or 2)
> > >
> > >         tmp = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private);
> > >         if (tmp)
> > >                 return min(max_level, tmp);
> > >
> > >         return max-level;
> >
> > Sean? What do you think?
>
> #2, because KVM uses a "ret0" static call when TDX is disabled (and KVM should
> do the same when SEV is disabled, but the SEV #ifdefs are still a bit messy).
> Switching to any other value would require adding a VMX stubs for the !TDX case.
>
> I think it makes sense to explicitly call that out as the "CoCo level", to help
> unfamiliar readers understand why vendor code has any say in the max
> mapping level.
>
> And I would say we adjust max_level instead of having an early return, e.g. to
> reduce the probability of future bugs due to adding code between the call to
> .gmem_max_mapping_level() and the final return.
>
> This as fixup?

Applied it to my tree. Builds and runs fine. Thanks!
/fuad

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index eead5dca6f72..a51013e0992a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3279,9 +3279,9 @@ static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fau
>                                      const struct kvm_memory_slot *slot, gfn_t gfn,
>                                      bool is_private)
>  {
> +       u8 max_level, coco_level;
>         struct page *page;
>         kvm_pfn_t pfn;
> -       u8 max_level;
>
>         /* For faults, use the gmem information that was resolved earlier. */
>         if (fault) {
> @@ -3305,8 +3305,16 @@ static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fau
>         if (max_level == PG_LEVEL_4K)
>                 return max_level;
>
> -       return min(max_level,
> -                  kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private));
> +       /*
> +        * CoCo may influence the max mapping level, e.g. due to RMP or S-EPT
> +        * restrictions.  A return of '0' means "no additional restrictions",
> +        * to allow for using an optional "ret0" static call.
> +        */
> +       coco_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private);
> +       if (coco_level)
> +               max_level = min(max_level, coco_level);
> +
> +       return max_level;
>  }
>
>  int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-22 14:37       ` Sean Christopherson
@ 2025-07-22 15:31         ` Xiaoyao Li
  2025-07-22 15:50           ` David Hildenbrand
  2025-07-22 15:54           ` Sean Christopherson
  0 siblings, 2 replies; 86+ messages in thread
From: Xiaoyao Li @ 2025-07-22 15:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 7/22/2025 10:37 PM, Sean Christopherson wrote:
> On Tue, Jul 22, 2025, Xiaoyao Li wrote:
>> On 7/21/2025 8:22 PM, Xiaoyao Li wrote:
>>> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
>>>> +/*
>>>> + * CoCo VMs with hardware support that use guest_memfd only for
>>>> backing private
>>>> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping
>>>> enabled.
>>>> + */
>>>> +#define kvm_arch_supports_gmem_mmap(kvm)        \
>>>> +    (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&    \
>>>> +     (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
>>>
>>> I want to share the findings when I do the POC to enable gmem mmap in QEMU.
>>>
>>> Actually, QEMU can use gmem with mmap support as the normal memory even
>>> without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
>>> on KVM_SET_USER_MEMORY_REGION2.
>>>
>>> Since the gmem is mmapable, QEMU can pass the userspace addr got from
>>> mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
>>> works well for non-coco VMs on x86.
>>
>> one more findings.
>>
>> I tested with QEMU by creating normal (non-private) memory with mmapable
>> guest memfd, and enforcily passing the fd of the gmem to struct
>> kvm_userspace_memory_region2 when QEMU sets up memory region.
>>
>> It hits the kvm_gmem_bind() error since QEMU tries to back different GPA
>> region with the same gmem.
>>
>> So, the question is do we want to allow the multi-binding for shared-only
>> gmem?
> 
> Can you elaborate, maybe with code?  I don't think I fully understand the setup.

well, I haven't fully sorted it out. Just share what I get so far.

the problem hit when SMM is enabled (which is enabled by default).

- The trace of "-machine q35,smm=off":

kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000 
ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000 
size=0x80000000 ua=0x7f57b3fff000 guest_memfd=15 
guest_memfd_offset=0x80000000 ret=0
kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x2 gpa=0xffc00000 
size=0x400000 ua=0x7f5840a00000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x0 gpa=0x0 size=0x0 
ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0xc0000 
ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x2 gpa=0xc0000 
size=0x20000 ua=0x7f5841000000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
kvm_set_user_memory AddrSpace#0 Slot#4 flags=0x2 gpa=0xe0000 
size=0x20000 ua=0x7f5840de0000 guest_memfd=-1 
guest_memfd_offset=0x3e0000 ret=0
kvm_set_user_memory AddrSpace#0 Slot#5 flags=0x4 gpa=0x100000 
size=0x7ff00000 ua=0x7f57340ff000 guest_memfd=15 
guest_memfd_offset=0x100000 ret=0

- The trace of "-machine q35"

kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000 
ua=0x7f8faffff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000 
size=0x80000000 ua=0x7f902ffff000 guest_memfd=15 
guest_memfd_offset=0x80000000 ret=0
kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x2 gpa=0xffc00000 
size=0x400000 ua=0x7f90bd000000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x4 gpa=0xfeda0000 
size=0x20000 ua=0x7f8fb009f000 guest_memfd=15 guest_memfd_offset=0xa0000 
ret=-22
qemu-system-x86_64: kvm_set_user_memory_region: 
KVM_SET_USER_MEMORY_REGION2 failed, slot=3, start=0xfeda0000, 
size=0x20000, flags=0x4, guest_memfd=15, guest_memfd_offset=0xa0000: 
Invalid argument
kvm_set_phys_mem: error registering slot: Invalid argument


where QEMU tries to setup the memory region for [0xfeda0000, +0x20000], 
which is back'ed by gmem (fd is 15) allocated for normal RAM, from 
offset 0xa0000.

What I have tracked down in QEMU is mch_realize(), where it sets up some 
memory region starting from 0xfeda0000.

If you want to reproduce it yourself, here is my QEMU branch

   https://github.com/intel-staging/qemu-tdx.git lxy/gmem-mmap-poc

To boot a VM with guest memfd:

   -object memory-backend-guest-memfd,id=gmem0,size=$mem
   -machine memory-backend=gmem0



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-22 15:31         ` Xiaoyao Li
@ 2025-07-22 15:50           ` David Hildenbrand
  2025-07-22 15:54           ` Sean Christopherson
  1 sibling, 0 replies; 86+ messages in thread
From: David Hildenbrand @ 2025-07-22 15:50 UTC (permalink / raw)
  To: Xiaoyao Li, Sean Christopherson
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 22.07.25 17:31, Xiaoyao Li wrote:
> On 7/22/2025 10:37 PM, Sean Christopherson wrote:
>> On Tue, Jul 22, 2025, Xiaoyao Li wrote:
>>> On 7/21/2025 8:22 PM, Xiaoyao Li wrote:
>>>> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
>>>>> +/*
>>>>> + * CoCo VMs with hardware support that use guest_memfd only for
>>>>> backing private
>>>>> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping
>>>>> enabled.
>>>>> + */
>>>>> +#define kvm_arch_supports_gmem_mmap(kvm)        \
>>>>> +    (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&    \
>>>>> +     (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
>>>>
>>>> I want to share the findings when I do the POC to enable gmem mmap in QEMU.
>>>>
>>>> Actually, QEMU can use gmem with mmap support as the normal memory even
>>>> without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
>>>> on KVM_SET_USER_MEMORY_REGION2.
>>>>
>>>> Since the gmem is mmapable, QEMU can pass the userspace addr got from
>>>> mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
>>>> works well for non-coco VMs on x86.
>>>
>>> one more findings.
>>>
>>> I tested with QEMU by creating normal (non-private) memory with mmapable
>>> guest memfd, and enforcily passing the fd of the gmem to struct
>>> kvm_userspace_memory_region2 when QEMU sets up memory region.
>>>
>>> It hits the kvm_gmem_bind() error since QEMU tries to back different GPA
>>> region with the same gmem.
>>>
>>> So, the question is do we want to allow the multi-binding for shared-only
>>> gmem?
>>
>> Can you elaborate, maybe with code?  I don't think I fully understand the setup.
> 
> well, I haven't fully sorted it out. Just share what I get so far.
> 
> the problem hit when SMM is enabled (which is enabled by default).
> 
> - The trace of "-machine q35,smm=off":
> 
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000
> size=0x80000000 ua=0x7f57b3fff000 guest_memfd=15
> guest_memfd_offset=0x80000000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x2 gpa=0xffc00000
> size=0x400000 ua=0x7f5840a00000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x0 gpa=0x0 size=0x0
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0xc0000
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x2 gpa=0xc0000
> size=0x20000 ua=0x7f5841000000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#4 flags=0x2 gpa=0xe0000
> size=0x20000 ua=0x7f5840de0000 guest_memfd=-1
> guest_memfd_offset=0x3e0000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#5 flags=0x4 gpa=0x100000
> size=0x7ff00000 ua=0x7f57340ff000 guest_memfd=15
> guest_memfd_offset=0x100000 ret=0
> 
> - The trace of "-machine q35"
> 
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000
> ua=0x7f8faffff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000
> size=0x80000000 ua=0x7f902ffff000 guest_memfd=15
> guest_memfd_offset=0x80000000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x2 gpa=0xffc00000
> size=0x400000 ua=0x7f90bd000000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x4 gpa=0xfeda0000
> size=0x20000 ua=0x7f8fb009f000 guest_memfd=15 guest_memfd_offset=0xa0000
> ret=-22
> qemu-system-x86_64: kvm_set_user_memory_region:
> KVM_SET_USER_MEMORY_REGION2 failed, slot=3, start=0xfeda0000,
> size=0x20000, flags=0x4, guest_memfd=15, guest_memfd_offset=0xa0000:
> Invalid argument
> kvm_set_phys_mem: error registering slot: Invalid argument

Weird. When splitting regions (I think that is what's happening), QEMU 
should first remove the old slots to then insert the new slots.

Otherwise there would be GPA overlaps as well?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-22 15:31         ` Xiaoyao Li
  2025-07-22 15:50           ` David Hildenbrand
@ 2025-07-22 15:54           ` Sean Christopherson
  1 sibling, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2025-07-22 15:54 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Tue, Jul 22, 2025, Xiaoyao Li wrote:
> On 7/22/2025 10:37 PM, Sean Christopherson wrote:
> > On Tue, Jul 22, 2025, Xiaoyao Li wrote:
> > > On 7/21/2025 8:22 PM, Xiaoyao Li wrote:
> > > > On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > > > > +/*
> > > > > + * CoCo VMs with hardware support that use guest_memfd only for
> > > > > backing private
> > > > > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping
> > > > > enabled.
> > > > > + */
> > > > > +#define kvm_arch_supports_gmem_mmap(kvm)        \
> > > > > +    (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&    \
> > > > > +     (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
> > > > 
> > > > I want to share the findings when I do the POC to enable gmem mmap in QEMU.
> > > > 
> > > > Actually, QEMU can use gmem with mmap support as the normal memory even
> > > > without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
> > > > on KVM_SET_USER_MEMORY_REGION2.
> > > > 
> > > > Since the gmem is mmapable, QEMU can pass the userspace addr got from
> > > > mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
> > > > works well for non-coco VMs on x86.
> > > 
> > > one more findings.
> > > 
> > > I tested with QEMU by creating normal (non-private) memory with mmapable
> > > guest memfd, and enforcily passing the fd of the gmem to struct
> > > kvm_userspace_memory_region2 when QEMU sets up memory region.
> > > 
> > > It hits the kvm_gmem_bind() error since QEMU tries to back different GPA
> > > region with the same gmem.
> > > 
> > > So, the question is do we want to allow the multi-binding for shared-only
> > > gmem?
> > 
> > Can you elaborate, maybe with code?  I don't think I fully understand the setup.
> 
> well, I haven't fully sorted it out. Just share what I get so far.
> 
> the problem hit when SMM is enabled (which is enabled by default).
> 
> - The trace of "-machine q35,smm=off":
> 
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000
> size=0x80000000 ua=0x7f57b3fff000 guest_memfd=15
> guest_memfd_offset=0x80000000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x2 gpa=0xffc00000
> size=0x400000 ua=0x7f5840a00000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x0 gpa=0x0 size=0x0
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0xc0000
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x2 gpa=0xc0000 size=0x20000
> ua=0x7f5841000000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#4 flags=0x2 gpa=0xe0000 size=0x20000
> ua=0x7f5840de0000 guest_memfd=-1 guest_memfd_offset=0x3e0000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#5 flags=0x4 gpa=0x100000
> size=0x7ff00000 ua=0x7f57340ff000 guest_memfd=15 guest_memfd_offset=0x100000
> ret=0
> 
> - The trace of "-machine q35"
> 
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000
> ua=0x7f8faffff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000
> size=0x80000000 ua=0x7f902ffff000 guest_memfd=15
> guest_memfd_offset=0x80000000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x2 gpa=0xffc00000
> size=0x400000 ua=0x7f90bd000000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x4 gpa=0xfeda0000 size=0x20000
> ua=0x7f8fb009f000 guest_memfd=15 guest_memfd_offset=0xa0000 ret=-22
> qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION2
> failed, slot=3, start=0xfeda0000, size=0x20000, flags=0x4, guest_memfd=15,
> guest_memfd_offset=0xa0000: Invalid argument
> kvm_set_phys_mem: error registering slot: Invalid argument
> 
> 
> where QEMU tries to setup the memory region for [0xfeda0000, +0x20000],
> which is back'ed by gmem (fd is 15) allocated for normal RAM, from offset
> 0xa0000.
> 
> What I have tracked down in QEMU is mch_realize(), where it sets up some
> memory region starting from 0xfeda0000.

Oh yay, SMM.  The problem lies in memory regions that are aliased into low memory
(IIRC, there's at least one other such scenario, but don't quote me on that).
For SMRAM, when the "high" SMRAM location (0xfeda0000) is enabled, the "legacy"
SMRAM location (0xa0000) gets remapped (aliased in QEMU's vernacular) to the
high location, resulting in two CPU physical addresses pointing at the same
underyling memory[*].  From KVM's perspective, that means two GPA ranges pointing
at the same HVA.

As for whether or not we want to support such madness...  I'd definitely say "not
now", and probably not ever.  Emulating SMM puts the VMM *firmly* in the TCB of
the guest, and so guest_memfd benefits like not having to map guest memory into
userspace pretty much go out the window.  For such a use case, I don't think it's
unreasonable to require QEMU (or any other VMM) to map the aliases via HVA only,
i.e. to not take full advantage of guest_memfd.

[*] https://opensecuritytraining.info/IntroBIOS_files/Day1_08_Advanced%20x86%20-%20BIOS%20and%20SMM%20Internals%20-%20SMRAM.pdf


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-07-22  9:29         ` Fuad Tabba
@ 2025-07-22 15:58           ` Sean Christopherson
  2025-07-22 16:01             ` Fuad Tabba
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-22 15:58 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Tue, Jul 22, 2025, Fuad Tabba wrote:
> On Mon, 21 Jul 2025 at 18:33, Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, Jul 21, 2025, Fuad Tabba wrote:
> > > > The below diff applies on top.  I'm guessing there may be some intermediate
> > > > ugliness (I haven't mapped out exactly where/how to squash this throughout the
> > > > series, and there is feedback relevant to future patches), but IMO this is a much
> > > > cleaner resting state (see the diff stats).
> > >
> > > So just so that I am clear, applying the diff below to the appropriate
> > > patches would address all the concerns that you have mentioned in this
> > > email?
> >
> > Yes?  It should, I just don't want to pinky swear in case I botched something.
> 
> Other than this patch not applying, nah, I think it's all good ;P. I
> guess base-commit: 9eba3a9ac9cd5922da7f6e966c01190f909ed640 is
> somewhere in a local tree of yours. There are quite a few conflicts
> and I don't think it would build even if based on the right tree,
> e.g.,  KVM_CAP_GUEST_MEMFD_MMAP is a rename of KVM_CAP_GMEM_MMAP,
> rather an addition of an undeclared identifier.
> 
> That said, I think I understand what you mean, and I can apply the
> spirit of this patch.
> 
> Stay tuned for v16.

Want to point me at your branch?  I can run it through my battery of tests, and
maybe save you/us from having to spin a v17.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-07-22 15:58           ` Sean Christopherson
@ 2025-07-22 16:01             ` Fuad Tabba
  2025-07-22 23:42               ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Fuad Tabba @ 2025-07-22 16:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Tue, 22 Jul 2025 at 16:58, Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Jul 22, 2025, Fuad Tabba wrote:
> > On Mon, 21 Jul 2025 at 18:33, Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Mon, Jul 21, 2025, Fuad Tabba wrote:
> > > > > The below diff applies on top.  I'm guessing there may be some intermediate
> > > > > ugliness (I haven't mapped out exactly where/how to squash this throughout the
> > > > > series, and there is feedback relevant to future patches), but IMO this is a much
> > > > > cleaner resting state (see the diff stats).
> > > >
> > > > So just so that I am clear, applying the diff below to the appropriate
> > > > patches would address all the concerns that you have mentioned in this
> > > > email?
> > >
> > > Yes?  It should, I just don't want to pinky swear in case I botched something.
> >
> > Other than this patch not applying, nah, I think it's all good ;P. I
> > guess base-commit: 9eba3a9ac9cd5922da7f6e966c01190f909ed640 is
> > somewhere in a local tree of yours. There are quite a few conflicts
> > and I don't think it would build even if based on the right tree,
> > e.g.,  KVM_CAP_GUEST_MEMFD_MMAP is a rename of KVM_CAP_GMEM_MMAP,
> > rather an addition of an undeclared identifier.
> >
> > That said, I think I understand what you mean, and I can apply the
> > spirit of this patch.
> >
> > Stay tuned for v16.
>
> Want to point me at your branch?  I can run it through my battery of tests, and
> maybe save you/us from having to spin a v17.

That would be great. Here it is:

https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-basic-6.16-v16

No known issues from my end. But can you have a look at the patch:

KVM: guest_memfd: Consolidate Kconfig and guest_memfd enable checks

In that I collected the changes to the config/enable checks that
didn't seem to fit well in any of the other patches.

Cheers,
/fuad


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-07-22 16:01             ` Fuad Tabba
@ 2025-07-22 23:42               ` Sean Christopherson
  2025-07-23  9:22                 ` Fuad Tabba
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-22 23:42 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Tue, Jul 22, 2025, Fuad Tabba wrote:
> On Tue, 22 Jul 2025 at 16:58, Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Tue, Jul 22, 2025, Fuad Tabba wrote:
> > > On Mon, 21 Jul 2025 at 18:33, Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Mon, Jul 21, 2025, Fuad Tabba wrote:
> > > > > > The below diff applies on top.  I'm guessing there may be some intermediate
> > > > > > ugliness (I haven't mapped out exactly where/how to squash this throughout the
> > > > > > series, and there is feedback relevant to future patches), but IMO this is a much
> > > > > > cleaner resting state (see the diff stats).
> > > > >
> > > > > So just so that I am clear, applying the diff below to the appropriate
> > > > > patches would address all the concerns that you have mentioned in this
> > > > > email?
> > > >
> > > > Yes?  It should, I just don't want to pinky swear in case I botched something.
> > >
> > > Other than this patch not applying, nah, I think it's all good ;P. I
> > > guess base-commit: 9eba3a9ac9cd5922da7f6e966c01190f909ed640 is
> > > somewhere in a local tree of yours. There are quite a few conflicts
> > > and I don't think it would build even if based on the right tree,
> > > e.g.,  KVM_CAP_GUEST_MEMFD_MMAP is a rename of KVM_CAP_GMEM_MMAP,
> > > rather an addition of an undeclared identifier.
> > >
> > > That said, I think I understand what you mean, and I can apply the
> > > spirit of this patch.
> > >
> > > Stay tuned for v16.
> >
> > Want to point me at your branch?  I can run it through my battery of tests, and
> > maybe save you/us from having to spin a v17.
> 
> That would be great. Here it is:
> 
> https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-basic-6.16-v16
> 
> No known issues from my end. But can you have a look at the patch:
> 
> KVM: guest_memfd: Consolidate Kconfig and guest_memfd enable checks
> 
> In that I collected the changes to the config/enable checks that
> didn't seem to fit well in any of the other patches.

Regarding config stuff, patch 02, KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE, is missing a KVM_GMEM => KVM_GUEST_MEMFD rename.

While playing with this, I also discovered why this code lives in the KVM_X86 config:

  select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM

Commit ea4290d77bda ("KVM: x86: leave kvm.ko out of the build if no vendor module
is requested") didn't have all the vendor netural configs depend on KVM_X86, and
so it's possible to end up with unmet dependencies.  E.g. KVM_SW_PROTECTED_VM can
be selected with KVM_X86=n, and thus with KVM_GUEST_MEMFD=n.

We could punt on that mess until after this series, but that'd be a even more
churn, and I'm not sure I could stomach giving acks for the continued addition
of ugly kconfig dependencies. :-)

Lastly, regarding "Consolidate Kconfig and guest_memfd enable checks", that needs
to land before f6a5f3a22bbe ("KVM: guest_memfd: Allow host to map guest_memfd pages"),
otherwise KVM will present a weird state where guest_memfd can be used for default
VMs, but if and only KVM_GUEST_MEMFD happens to be selected by something else.
That also provides a better shortlog: "KVM: x86: Enable KVM_GUEST_MEMFD for all
64-bit builds".  The config cleanups and consolidations are a nice side effect,
but what that patch is really doing is enabling KVM_GUEST_MEMFD more broadly.

Actually, all of the arch patches need to come before f6a5f3a22bbe ("KVM: guest_memfd:
Allow host to map guest_memfd pages"), otherwise intermediate builds will have
half-baked support for guest_memfd mmap().  Or rather, KVM shouldn't let userspace
enable GUEST_MEMFD_FLAG_MMAP until all the plumbing is in place.  I suspect that
trying to shuffle the full patches around will create cyclical dependency hell.
It's easy enough to hold off on adding GUEST_MEMFD_FLAG_MMAP until KVM is fully
ready, so I think it makes sense to just add GUEST_MEMFD_FLAG_MMAP along with the
capability.

Rather than trying to pass partial patches around, I pushed a branch to:

  https://github.com/sean-jc/linux.git x86/gmem_mmap

Outside of the x86 config crud, and deferring GUEST_MEMFD_FLAG_MMAP until KVM is
fully prepped, there _shouldn't_ be any changes relatively to what you have.

Note, it's based on:

  https://github.com/kvm-x86/linux.git next

as there are x86 kconfig dependencies/conflicts with changes that are destined
for 6.17 (and I don't think landing this in 6.17 is realistic, i.e. this series
will effectively follow kvm-x86/next no matter what).

I haven't done a ton of runtime testing yet, but it passes all of my build tests
(I have far too many configs), so I'm reasonably confident all the kconfig stuff
isn't horribly broken.

Oh, and I also squashed this into the very last patch.  The curly braces, line
wrap, and hardcoded boolean are all superfluous.

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 4cdccabc160c..a0c5db8fd72d 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -249,8 +249,7 @@ static bool check_vm_type(unsigned long vm_type)
        return kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type);
 }

-static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
-                          bool expect_mmap_allowed)
+static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags)
 {
        struct kvm_vm *vm;
        size_t total_size;
@@ -272,7 +271,7 @@ static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,

        test_file_read_write(fd);

-       if (expect_mmap_allowed) {
+       if (guest_memfd_flags & GUEST_MEMFD_FLAG_MMAP) {
                test_mmap_supported(fd, page_size, total_size);
                test_fault_overflow(fd, page_size, total_size);

@@ -343,13 +342,11 @@ int main(int argc, char *argv[])

        test_gmem_flag_validity();

-       test_with_type(VM_TYPE_DEFAULT, 0, false);
-       if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MMAP)) {
-               test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_MMAP,
-                              true);
-       }
+       test_with_type(VM_TYPE_DEFAULT, 0);
+       if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MMAP))
+               test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_MMAP);

 #ifdef __x86_64__
-       test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
+       test_with_type(KVM_X86_SW_PROTECTED_VM, 0);
 #endif
 }

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-07-22 12:31   ` Kunwu Chan
@ 2025-07-23  8:20     ` Marc Zyngier
  2025-07-23 11:44       ` Kunwu Chan
  0 siblings, 1 reply; 86+ messages in thread
From: Marc Zyngier @ 2025-07-23  8:20 UTC (permalink / raw)
  To: Kunwu Chan
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, seanjc, viro,
	brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On Tue, 22 Jul 2025 13:31:34 +0100,
Kunwu Chan <kunwu.chan@linux.dev> wrote:
> 
> On 2025/7/18 00:27, Fuad Tabba wrote:
> > Add arm64 architecture support for handling guest page faults on memory
> > slots backed by guest_memfd.
> > 
> > This change introduces a new function, gmem_abort(), which encapsulates
> > the fault handling logic specific to guest_memfd-backed memory. The
> > kvm_handle_guest_abort() entry point is updated to dispatch to
> > gmem_abort() when a fault occurs on a guest_memfd-backed memory slot (as
> > determined by kvm_slot_has_gmem()).
> > 
> > Until guest_memfd gains support for huge pages, the fault granule for
> > these memory regions is restricted to PAGE_SIZE.
> 
> Since huge pages are not currently supported, would it be more
> friendly to define  sth like
> 
> "#define GMEM_PAGE_GRANULE PAGE_SIZE" at the top (rather than
> hardcoding PAGE_SIZE)
> 
>  and make it easier to switch to huge page support later?

No. PAGE_SIZE always has to be the fallback, no matter what. When (and
if) larger mappings get supported, there will be extra code for that
purpose, not just flipping a definition.

Thanks,

	M.

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-07-17 16:27 ` [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
  2025-07-22 12:31   ` Kunwu Chan
@ 2025-07-23  8:26   ` Marc Zyngier
  1 sibling, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2025-07-23  8:26 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On Thu, 17 Jul 2025 17:27:26 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> Add arm64 architecture support for handling guest page faults on memory
> slots backed by guest_memfd.
> 
> This change introduces a new function, gmem_abort(), which encapsulates
> the fault handling logic specific to guest_memfd-backed memory. The
> kvm_handle_guest_abort() entry point is updated to dispatch to
> gmem_abort() when a fault occurs on a guest_memfd-backed memory slot (as
> determined by kvm_slot_has_gmem()).
> 
> Until guest_memfd gains support for huge pages, the fault granule for
> these memory regions is restricted to PAGE_SIZE.
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 17/21] KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd
  2025-07-17 16:27 ` [PATCH v15 17/21] KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd Fuad Tabba
@ 2025-07-23  8:29   ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2025-07-23  8:29 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On Thu, 17 Jul 2025 17:27:27 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> Handle faults for memslots backed by guest_memfd in arm64 nested
> virtualization triggerred by VNCR_EL2.
> 
> * Introduce is_gmem output parameter to kvm_translate_vncr(), indicating
>   whether the faulted memory slot is backed by guest_memfd.
> 
> * Dispatch faults backed by guest_memfd to kvm_gmem_get_pfn().
> 
> * Update kvm_handle_vncr_abort() to handle potential guest_memfd errors.
>   Some of the guest_memfd errors need to be handled by userspace,
>   instead of attempting to (implicitly) retry by returning to the guest.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 18/21] KVM: arm64: Enable host mapping of shared guest_memfd memory
  2025-07-17 16:27 ` [PATCH v15 18/21] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
@ 2025-07-23  8:33   ` Marc Zyngier
  2025-07-23  9:18     ` Fuad Tabba
  0 siblings, 1 reply; 86+ messages in thread
From: Marc Zyngier @ 2025-07-23  8:33 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On Thu, 17 Jul 2025 17:27:28 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> Enable host userspace mmap support for guest_memfd-backed memory on
> arm64. This change provides arm64 with the capability to map guest
> memory at the host directly from guest_memfd:
> 
> * Define kvm_arch_supports_gmem_mmap() for arm64: The
>   kvm_arch_supports_gmem_mmap() macro is defined for arm64 to be true if
>   CONFIG_KVM_GMEM_SUPPORTS_MMAP is enabled. For existing arm64 KVM VM
>   types that support guest_memfd, this enables them to use guest_memfd
>   with host userspace mappings. This provides a consistent behavior as
>   there are currently no arm64 CoCo VMs that rely on guest_memfd solely
>   for private, non-mappable memory. Future arm64 VM types can override
>   or restrict this behavior via the kvm_arch_supports_gmem_mmap() hook
>   if needed.
> 
> * Select CONFIG_KVM_GMEM_SUPPORTS_MMAP in arm64 Kconfig.
> 
> * Enforce KVM_MEMSLOT_GMEM_ONLY for guest_memfd on arm64: Checks are
>   added to ensure that if guest_memfd is enabled on arm64,
>   KVM_GMEM_SUPPORTS_MMAP must also be enabled. This means
>   guest_memfd-backed memory slots on arm64 are currently only supported
>   if they are intended for shared memory use cases (i.e.,
>   kvm_memslot_is_gmem_only() is true). This design reflects the current
>   arm64 KVM ecosystem where guest_memfd is primarily being introduced
>   for VMs that support shared memory.
> 
> Reviewed-by: James Houghton <jthoughton@google.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h | 4 ++++
>  arch/arm64/kvm/Kconfig            | 2 ++
>  arch/arm64/kvm/mmu.c              | 7 +++++++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 3e41a880b062..63f7827cfa1b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1674,5 +1674,9 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt);
>  void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1);
>  void check_feature_map(void);
>  
> +#ifdef CONFIG_KVM_GMEM
> +#define kvm_arch_supports_gmem(kvm) true
> +#define kvm_arch_supports_gmem_mmap(kvm) IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP)
> +#endif

nit: these two lines should be trivially 'true', and the #ifdef-ery
removed, since both KVM_GMEM and KVM_GMEM_SUPPORTS_MMAP are always
selected, no ifs, no buts.

>  
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 713248f240e0..323b46b7c82f 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -37,6 +37,8 @@ menuconfig KVM
>  	select HAVE_KVM_VCPU_RUN_PID_CHANGE
>  	select SCHED_INFO
>  	select GUEST_PERF_EVENTS if PERF_EVENTS
> +	select KVM_GMEM
> +	select KVM_GMEM_SUPPORTS_MMAP
>  	help
>  	  Support hosting virtualized guest machines.
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 8c82df80a835..85559b8a0845 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -2276,6 +2276,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  	if ((new->base_gfn + new->npages) > (kvm_phys_size(&kvm->arch.mmu) >> PAGE_SHIFT))
>  		return -EFAULT;
>  
> +	/*
> +	 * Only support guest_memfd backed memslots with mappable memory, since
> +	 * there aren't any CoCo VMs that support only private memory on arm64.
> +	 */
> +	if (kvm_slot_has_gmem(new) && !kvm_memslot_is_gmem_only(new))
> +		return -EINVAL;
> +
>  	hva = new->userspace_addr;
>  	reg_end = hva + (new->npages << PAGE_SHIFT);
>  

Otherwise,

Reviewed-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 18/21] KVM: arm64: Enable host mapping of shared guest_memfd memory
  2025-07-23  8:33   ` Marc Zyngier
@ 2025-07-23  9:18     ` Fuad Tabba
  0 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-23  9:18 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
	akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

Hi Marc,

On Wed, 23 Jul 2025 at 09:33, Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 17 Jul 2025 17:27:28 +0100,
> Fuad Tabba <tabba@google.com> wrote:
> >
> > Enable host userspace mmap support for guest_memfd-backed memory on
> > arm64. This change provides arm64 with the capability to map guest
> > memory at the host directly from guest_memfd:
> >
> > * Define kvm_arch_supports_gmem_mmap() for arm64: The
> >   kvm_arch_supports_gmem_mmap() macro is defined for arm64 to be true if
> >   CONFIG_KVM_GMEM_SUPPORTS_MMAP is enabled. For existing arm64 KVM VM
> >   types that support guest_memfd, this enables them to use guest_memfd
> >   with host userspace mappings. This provides a consistent behavior as
> >   there are currently no arm64 CoCo VMs that rely on guest_memfd solely
> >   for private, non-mappable memory. Future arm64 VM types can override
> >   or restrict this behavior via the kvm_arch_supports_gmem_mmap() hook
> >   if needed.
> >
> > * Select CONFIG_KVM_GMEM_SUPPORTS_MMAP in arm64 Kconfig.
> >
> > * Enforce KVM_MEMSLOT_GMEM_ONLY for guest_memfd on arm64: Checks are
> >   added to ensure that if guest_memfd is enabled on arm64,
> >   KVM_GMEM_SUPPORTS_MMAP must also be enabled. This means
> >   guest_memfd-backed memory slots on arm64 are currently only supported
> >   if they are intended for shared memory use cases (i.e.,
> >   kvm_memslot_is_gmem_only() is true). This design reflects the current
> >   arm64 KVM ecosystem where guest_memfd is primarily being introduced
> >   for VMs that support shared memory.
> >
> > Reviewed-by: James Houghton <jthoughton@google.com>
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Acked-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h | 4 ++++
> >  arch/arm64/kvm/Kconfig            | 2 ++
> >  arch/arm64/kvm/mmu.c              | 7 +++++++
> >  3 files changed, 13 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 3e41a880b062..63f7827cfa1b 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1674,5 +1674,9 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt);
> >  void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1);
> >  void check_feature_map(void);
> >
> > +#ifdef CONFIG_KVM_GMEM
> > +#define kvm_arch_supports_gmem(kvm) true
> > +#define kvm_arch_supports_gmem_mmap(kvm) IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP)
> > +#endif
>
> nit: these two lines should be trivially 'true', and the #ifdef-ery
> removed, since both KVM_GMEM and KVM_GMEM_SUPPORTS_MMAP are always
> selected, no ifs, no buts.

I'll fix these.

> >
> >  #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 713248f240e0..323b46b7c82f 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -37,6 +37,8 @@ menuconfig KVM
> >       select HAVE_KVM_VCPU_RUN_PID_CHANGE
> >       select SCHED_INFO
> >       select GUEST_PERF_EVENTS if PERF_EVENTS
> > +     select KVM_GMEM
> > +     select KVM_GMEM_SUPPORTS_MMAP
> >       help
> >         Support hosting virtualized guest machines.
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 8c82df80a835..85559b8a0845 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -2276,6 +2276,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >       if ((new->base_gfn + new->npages) > (kvm_phys_size(&kvm->arch.mmu) >> PAGE_SHIFT))
> >               return -EFAULT;
> >
> > +     /*
> > +      * Only support guest_memfd backed memslots with mappable memory, since
> > +      * there aren't any CoCo VMs that support only private memory on arm64.
> > +      */
> > +     if (kvm_slot_has_gmem(new) && !kvm_memslot_is_gmem_only(new))
> > +             return -EINVAL;
> > +
> >       hva = new->userspace_addr;
> >       reg_end = hva + (new->npages << PAGE_SHIFT);
> >
>
> Otherwise,
>
> Reviewed-by: Marc Zyngier <maz@kernel.org>

Thanks for the reviews!

Cheers,
/fuad

>         M.
>
> --
> Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-07-22 23:42               ` Sean Christopherson
@ 2025-07-23  9:22                 ` Fuad Tabba
  0 siblings, 0 replies; 86+ messages in thread
From: Fuad Tabba @ 2025-07-23  9:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
	anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Wed, 23 Jul 2025 at 00:42, Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Jul 22, 2025, Fuad Tabba wrote:
> > On Tue, 22 Jul 2025 at 16:58, Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Tue, Jul 22, 2025, Fuad Tabba wrote:
> > > > On Mon, 21 Jul 2025 at 18:33, Sean Christopherson <seanjc@google.com> wrote:
> > > > >
> > > > > On Mon, Jul 21, 2025, Fuad Tabba wrote:
> > > > > > > The below diff applies on top.  I'm guessing there may be some intermediate
> > > > > > > ugliness (I haven't mapped out exactly where/how to squash this throughout the
> > > > > > > series, and there is feedback relevant to future patches), but IMO this is a much
> > > > > > > cleaner resting state (see the diff stats).
> > > > > >
> > > > > > So just so that I am clear, applying the diff below to the appropriate
> > > > > > patches would address all the concerns that you have mentioned in this
> > > > > > email?
> > > > >
> > > > > Yes?  It should, I just don't want to pinky swear in case I botched something.
> > > >
> > > > Other than this patch not applying, nah, I think it's all good ;P. I
> > > > guess base-commit: 9eba3a9ac9cd5922da7f6e966c01190f909ed640 is
> > > > somewhere in a local tree of yours. There are quite a few conflicts
> > > > and I don't think it would build even if based on the right tree,
> > > > e.g.,  KVM_CAP_GUEST_MEMFD_MMAP is a rename of KVM_CAP_GMEM_MMAP,
> > > > rather an addition of an undeclared identifier.
> > > >
> > > > That said, I think I understand what you mean, and I can apply the
> > > > spirit of this patch.
> > > >
> > > > Stay tuned for v16.
> > >
> > > Want to point me at your branch?  I can run it through my battery of tests, and
> > > maybe save you/us from having to spin a v17.
> >
> > That would be great. Here it is:
> >
> > https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-basic-6.16-v16
> >
> > No known issues from my end. But can you have a look at the patch:
> >
> > KVM: guest_memfd: Consolidate Kconfig and guest_memfd enable checks
> >
> > In that I collected the changes to the config/enable checks that
> > didn't seem to fit well in any of the other patches.
>
> Regarding config stuff, patch 02, KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
> CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE, is missing a KVM_GMEM => KVM_GUEST_MEMFD rename.
>
> While playing with this, I also discovered why this code lives in the KVM_X86 config:
>
>   select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
>
> Commit ea4290d77bda ("KVM: x86: leave kvm.ko out of the build if no vendor module
> is requested") didn't have all the vendor netural configs depend on KVM_X86, and
> so it's possible to end up with unmet dependencies.  E.g. KVM_SW_PROTECTED_VM can
> be selected with KVM_X86=n, and thus with KVM_GUEST_MEMFD=n.
>
> We could punt on that mess until after this series, but that'd be a even more
> churn, and I'm not sure I could stomach giving acks for the continued addition
> of ugly kconfig dependencies. :-)
>
> Lastly, regarding "Consolidate Kconfig and guest_memfd enable checks", that needs
> to land before f6a5f3a22bbe ("KVM: guest_memfd: Allow host to map guest_memfd pages"),
> otherwise KVM will present a weird state where guest_memfd can be used for default
> VMs, but if and only KVM_GUEST_MEMFD happens to be selected by something else.
> That also provides a better shortlog: "KVM: x86: Enable KVM_GUEST_MEMFD for all
> 64-bit builds".  The config cleanups and consolidations are a nice side effect,
> but what that patch is really doing is enabling KVM_GUEST_MEMFD more broadly.
>
> Actually, all of the arch patches need to come before f6a5f3a22bbe ("KVM: guest_memfd:
> Allow host to map guest_memfd pages"), otherwise intermediate builds will have
> half-baked support for guest_memfd mmap().  Or rather, KVM shouldn't let userspace
> enable GUEST_MEMFD_FLAG_MMAP until all the plumbing is in place.  I suspect that
> trying to shuffle the full patches around will create cyclical dependency hell.
> It's easy enough to hold off on adding GUEST_MEMFD_FLAG_MMAP until KVM is fully
> ready, so I think it makes sense to just add GUEST_MEMFD_FLAG_MMAP along with the
> capability.
>
> Rather than trying to pass partial patches around, I pushed a branch to:
>
>   https://github.com/sean-jc/linux.git x86/gmem_mmap
>
> Outside of the x86 config crud, and deferring GUEST_MEMFD_FLAG_MMAP until KVM is
> fully prepped, there _shouldn't_ be any changes relatively to what you have.
>
> Note, it's based on:
>
>   https://github.com/kvm-x86/linux.git next
>
> as there are x86 kconfig dependencies/conflicts with changes that are destined
> for 6.17 (and I don't think landing this in 6.17 is realistic, i.e. this series
> will effectively follow kvm-x86/next no matter what).
>
> I haven't done a ton of runtime testing yet, but it passes all of my build tests
> (I have far too many configs), so I'm reasonably confident all the kconfig stuff
> isn't horribly broken.
>
> Oh, and I also squashed this into the very last patch.  The curly braces, line
> wrap, and hardcoded boolean are all superfluous.

Thank you for this. These patches look good to me.

I've tested them on x86 and arm64, and everything runs fine. I'll have
a closer look at them, and probably send v16 later today.

I know that it's probably too late for 6.17, but it would be great if
we could queue this for 6.18.

Cheers,
/fuad


> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 4cdccabc160c..a0c5db8fd72d 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -249,8 +249,7 @@ static bool check_vm_type(unsigned long vm_type)
>         return kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type);
>  }
>
> -static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
> -                          bool expect_mmap_allowed)
> +static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags)
>  {
>         struct kvm_vm *vm;
>         size_t total_size;
> @@ -272,7 +271,7 @@ static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
>
>         test_file_read_write(fd);
>
> -       if (expect_mmap_allowed) {
> +       if (guest_memfd_flags & GUEST_MEMFD_FLAG_MMAP) {
>                 test_mmap_supported(fd, page_size, total_size);
>                 test_fault_overflow(fd, page_size, total_size);
>
> @@ -343,13 +342,11 @@ int main(int argc, char *argv[])
>
>         test_gmem_flag_validity();
>
> -       test_with_type(VM_TYPE_DEFAULT, 0, false);
> -       if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MMAP)) {
> -               test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_MMAP,
> -                              true);
> -       }
> +       test_with_type(VM_TYPE_DEFAULT, 0);
> +       if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MMAP))
> +               test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_MMAP);
>
>  #ifdef __x86_64__
> -       test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
> +       test_with_type(KVM_X86_SW_PROTECTED_VM, 0);
>  #endif
>  }


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults
  2025-07-23  8:20     ` Marc Zyngier
@ 2025-07-23 11:44       ` Kunwu Chan
  0 siblings, 0 replies; 86+ messages in thread
From: Kunwu Chan @ 2025-07-23 11:44 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, seanjc, viro,
	brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta, ira.weiny

On 2025/7/23 16:20, Marc Zyngier wrote:
> On Tue, 22 Jul 2025 13:31:34 +0100,
> Kunwu Chan <kunwu.chan@linux.dev> wrote:
>> On 2025/7/18 00:27, Fuad Tabba wrote:
>>> Add arm64 architecture support for handling guest page faults on memory
>>> slots backed by guest_memfd.
>>>
>>> This change introduces a new function, gmem_abort(), which encapsulates
>>> the fault handling logic specific to guest_memfd-backed memory. The
>>> kvm_handle_guest_abort() entry point is updated to dispatch to
>>> gmem_abort() when a fault occurs on a guest_memfd-backed memory slot (as
>>> determined by kvm_slot_has_gmem()).
>>>
>>> Until guest_memfd gains support for huge pages, the fault granule for
>>> these memory regions is restricted to PAGE_SIZE.
>> Since huge pages are not currently supported, would it be more
>> friendly to define  sth like
>>
>> "#define GMEM_PAGE_GRANULE PAGE_SIZE" at the top (rather than
>> hardcoding PAGE_SIZE)
>>
>>   and make it easier to switch to huge page support later?
> No. PAGE_SIZE always has to be the fallback, no matter what. When (and
> if) larger mappings get supported, there will be extra code for that
> purpose, not just flipping a definition.
>
> Thanks,
>
> 	M.

Got it, no questions here. Feel free to add my "Reviewed-by" tag to the 
patch.

Reviewed-by: Tao Chan <chentao@kylinos.cn>

Thanks,
	TAO.
---
“Life finds a way.”



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-22 14:35                   ` Sean Christopherson
@ 2025-07-23 14:08                     ` Vishal Annapurve
  2025-07-23 14:43                       ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Vishal Annapurve @ 2025-07-23 14:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Tue, Jul 22, 2025 at 7:35 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Jul 21, 2025, Vishal Annapurve wrote:
> > On Mon, Jul 21, 2025 at 3:21 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Mon, Jul 21, 2025, Vishal Annapurve wrote:
> > > > On Mon, Jul 21, 2025 at 10:29 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > >
> > > > > >
> > > > > > > > 2) KVM fetches shared faults through userspace page tables and not
> > > > > > > > guest_memfd directly.
> > > > > > >
> > > > > > > This is also irrelevant.  KVM _already_ supports resolving shared faults through
> > > > > > > userspace page tables.  That support won't go away as KVM will always need/want
> > > > > > > to support mapping VM_IO and/or VM_PFNMAP memory into the guest (even for TDX).
> > > >
> > > > As a combination of [1] and [2], I believe we are saying that for
> > > > memslots backed by mappable guest_memfd files, KVM will always serve
> > > > both shared/private faults using kvm_gmem_get_pfn().
> > >
> > > No, KVM can't guarantee that with taking and holding mmap_lock across hva_to_pfn(),
> > > and as I mentioned earlier in the thread, that's a non-starter for me.
> >
> > I think what you mean is that if KVM wants to enforce the behavior
> > that VMAs passed by the userspace are backed by the same guest_memfd
> > file as passed in the memslot then KVM will need to hold mmap_lock
> > across hva_to_pfn() to verify that.
>
> No, I'm talking about the case where userspace creates a memslot *without*
> KVM_MEM_GUEST_MEMFD, but with userspace_addr pointing at a mmap()'d guest_memfd
> instance.  That is the scenario Xiaoyao brought up:
>
>  : Actually, QEMU can use gmem with mmap support as the normal memory even
>  : without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
>  : on KVM_SET_USER_MEMORY_REGION2.
>  :
>  : ...
>  :
>  : However, it fails actually, because the kvm_arch_suports_gmem_mmap()
>  : returns false for TDX VMs, which means userspace cannot allocate gmem
>  : with mmap just for shared memory for TDX.

Ok, yeah. I completely misjudged the usecase that Xiaoyao brought up.
You are right.

These are two different scenarios that I mixed up:
1) Userspace brings a non-mappable guest_memfd to back guest private
memory (passed as guest_memfd field in the
KVM_USERSPACE_MEMORY_REGION2) : This is the legacy case that needs
separate memory to back userspace_addr. As Sean mentioned, userspace
should be able to bring VMAs backed by any mappable files including
guest_memfd except mappable guest_memfd is not supported for SNP/TDX
VMs today, that support will come in stage2. KVM doesn't need to
enforce anything here as we can be sure that VMAs and unmappable
guest_memfd are pointing to different physical ranges.

2) Userspace brings a mappable guest_memfd to back guest private
memory (passed as guest_memfd field in the
KVM_USERSPACE_MEMORY_REGION2): KVM will always fault in all guest
faults via guest_memfd so if userspace brings in VMAs that point to
different physical memory then there would be a discrepancy between
what guest and userspace/KVM (going through HVAs) sees for shared
memory ranges. I am not sure if KVM needs to enforce anything here,
IMO it's the problem between userspace and guest to resolve. One thing
we may need to resolve is that invalidations of KVM EPT/NPT tables for
shared ranges should be triggered only by guest_memfd invalidations
(This is something we need to resolve when conversions will be
supported on guest_memfd i.e. not for this series, but the next
stage).

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-23 14:08                     ` Vishal Annapurve
@ 2025-07-23 14:43                       ` Sean Christopherson
  2025-07-23 14:46                         ` David Hildenbrand
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2025-07-23 14:43 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: Xiaoyao Li, Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On Wed, Jul 23, 2025, Vishal Annapurve wrote:
> 2) Userspace brings a mappable guest_memfd to back guest private
> memory (passed as guest_memfd field in the
> KVM_USERSPACE_MEMORY_REGION2): KVM will always fault in all guest
> faults via guest_memfd so if userspace brings in VMAs that point to
> different physical memory then there would be a discrepancy between
> what guest and userspace/KVM (going through HVAs) sees for shared
> memory ranges. I am not sure if KVM needs to enforce anything here,

We agreed (I think in a guest_memfd call?) that KVM won't enforce anything,
because trying to do so for uaccesses, e.g. via __kvm_read_guest_page(), would
require grabbing mmap_lock in hot paths, i.e. would be a complete non-starter.

So yeah, it's the VMM's responsibility to not be stupid.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
  2025-07-23 14:43                       ` Sean Christopherson
@ 2025-07-23 14:46                         ` David Hildenbrand
  0 siblings, 0 replies; 86+ messages in thread
From: David Hildenbrand @ 2025-07-23 14:46 UTC (permalink / raw)
  To: Sean Christopherson, Vishal Annapurve
  Cc: Xiaoyao Li, Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, ira.weiny

On 23.07.25 16:43, Sean Christopherson wrote:
> On Wed, Jul 23, 2025, Vishal Annapurve wrote:
>> 2) Userspace brings a mappable guest_memfd to back guest private
>> memory (passed as guest_memfd field in the
>> KVM_USERSPACE_MEMORY_REGION2): KVM will always fault in all guest
>> faults via guest_memfd so if userspace brings in VMAs that point to
>> different physical memory then there would be a discrepancy between
>> what guest and userspace/KVM (going through HVAs) sees for shared
>> memory ranges. I am not sure if KVM needs to enforce anything here,
> 
> We agreed (I think in a guest_memfd call?) that KVM won't enforce anything,
Right. We'll document it but not sanity check it at KVM slot creation 
time. If the VMM does something stupid, not our problem.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2025-07-23 14:46 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 01/21] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-07-21 15:17   ` Sean Christopherson
2025-07-21 15:26     ` Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-07-21 16:44   ` Sean Christopherson
2025-07-21 16:51     ` Fuad Tabba
2025-07-21 17:33       ` Sean Christopherson
2025-07-22  9:29         ` Fuad Tabba
2025-07-22 15:58           ` Sean Christopherson
2025-07-22 16:01             ` Fuad Tabba
2025-07-22 23:42               ` Sean Christopherson
2025-07-23  9:22                 ` Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem() Fuad Tabba
2025-07-18  1:42   ` Xiaoyao Li
2025-07-21 14:47     ` Sean Christopherson
2025-07-21 14:55     ` Fuad Tabba
2025-07-21 16:44   ` Sean Christopherson
2025-07-17 16:27 ` [PATCH v15 04/21] KVM: x86: Introduce kvm->arch.supports_gmem Fuad Tabba
2025-07-21 16:45   ` Sean Christopherson
2025-07-21 17:00     ` Fuad Tabba
2025-07-21 19:09       ` Sean Christopherson
2025-07-17 16:27 ` [PATCH v15 05/21] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 06/21] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 07/21] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 08/21] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-07-18  2:56   ` Xiaoyao Li
2025-07-17 16:27 ` [PATCH v15 09/21] KVM: guest_memfd: Track guest_memfd mmap support in memslot Fuad Tabba
2025-07-18  3:33   ` Xiaoyao Li
2025-07-17 16:27 ` [PATCH v15 10/21] KVM: x86/mmu: Generalize private_max_mapping_level x86 op to max_mapping_level Fuad Tabba
2025-07-18  6:19   ` Xiaoyao Li
2025-07-21 19:46   ` Sean Christopherson
2025-07-17 16:27 ` [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level Fuad Tabba
2025-07-18  5:10   ` Xiaoyao Li
2025-07-21 23:17     ` Sean Christopherson
2025-07-22  5:35       ` Xiaoyao Li
2025-07-22 11:08         ` Fuad Tabba
2025-07-22 14:32           ` Sean Christopherson
2025-07-22 15:30             ` Fuad Tabba
2025-07-22 10:35       ` Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 12/21] KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level Fuad Tabba
2025-07-18  5:32   ` Xiaoyao Li
2025-07-18  5:57     ` Xiaoyao Li
2025-07-17 16:27 ` [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-07-18  6:09   ` Xiaoyao Li
2025-07-21 16:47   ` Sean Christopherson
2025-07-21 16:56     ` Fuad Tabba
2025-07-22  5:41     ` Xiaoyao Li
2025-07-22  8:43       ` Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type Fuad Tabba
2025-07-18  6:10   ` Xiaoyao Li
2025-07-21 12:22   ` Xiaoyao Li
2025-07-21 12:41     ` Fuad Tabba
2025-07-21 13:45     ` Vishal Annapurve
2025-07-21 14:42       ` Xiaoyao Li
2025-07-21 14:42       ` Sean Christopherson
2025-07-21 15:07         ` Xiaoyao Li
2025-07-21 17:29           ` Sean Christopherson
2025-07-21 20:33             ` Vishal Annapurve
2025-07-21 22:21               ` Sean Christopherson
2025-07-21 23:50                 ` Vishal Annapurve
2025-07-22 14:35                   ` Sean Christopherson
2025-07-23 14:08                     ` Vishal Annapurve
2025-07-23 14:43                       ` Sean Christopherson
2025-07-23 14:46                         ` David Hildenbrand
2025-07-22 14:28     ` Xiaoyao Li
2025-07-22 14:37       ` Sean Christopherson
2025-07-22 15:31         ` Xiaoyao Li
2025-07-22 15:50           ` David Hildenbrand
2025-07-22 15:54           ` Sean Christopherson
2025-07-17 16:27 ` [PATCH v15 15/21] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
2025-07-22 12:31   ` Kunwu Chan
2025-07-23  8:20     ` Marc Zyngier
2025-07-23 11:44       ` Kunwu Chan
2025-07-23  8:26   ` Marc Zyngier
2025-07-17 16:27 ` [PATCH v15 17/21] KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd Fuad Tabba
2025-07-23  8:29   ` Marc Zyngier
2025-07-17 16:27 ` [PATCH v15 18/21] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
2025-07-23  8:33   ` Marc Zyngier
2025-07-23  9:18     ` Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 19/21] KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP Fuad Tabba
2025-07-18  6:14   ` Xiaoyao Li
2025-07-21 17:31   ` Sean Christopherson
2025-07-17 16:27 ` [PATCH v15 20/21] KVM: selftests: Do not use hardcoded page sizes in guest_memfd test Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 21/21] KVM: selftests: guest_memfd mmap() test when mmap is supported Fuad Tabba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).