[PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs

linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
@ 2025-04-30 16:56 Fuad Tabba
  2025-04-30 16:56 ` [PATCH v8 01/13] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
                   ` (12 more replies)
  0 siblings, 13 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Main changes since v7 [1]:
- Renaming/refactoring to decouple guest memory from whether the
  underlying memory is private vs being backed by guest_memfd
- Drop folio_put() callback patches
- Fixes based on feedback from the previous series
- Rebase on Linux 6.15-rc4

The purpose of this series is to allow mapping guest_memfd backed memory
at the host. This support enables VMMs like Firecracker to run VM guests
backed completely by guest_memfd [2]. Combined with Patrick's series for
direct map removal in guest_memfd [3], this would allow running VMs that
offer additional hardening against Spectre-like transient execution
attacks.

This series will also serve as a base for _restricted_ mmap() support
for guest_memfd backed memory at the host for CoCos that allow sharing
guest memory in-place with the host [4].

Patches 1 to 7 are mainly about decoupling the concept of guest memory
being private vs guest memory being backed by guest_memfd. They are
mostly refactoring and renaming.

Patch 8 adds support for in-place shared memory, as well as the ability
to map it by the host as long as it is shared, gated by a new
configuration option, and adviertised to userspace by a new capability.

Patches 9 to 12 add arm64 and x86 support for in-place shared memory.

Patch 13 expands the guest_memfd selftest to test in-place shared memory
when avaialble.

To test this patch series on x86 (I use a standard Debian image):

Build:

- Build the kernel with the following config options enabled:
defconfigs:
	x86_64_defconfig
	kvm_guest.config
Additional config options to enable:
	KVM_SW_PROTECTED_VM
	KVM_GMEM_SHARED_MEM

- Build the kernel kvm selftest tools/testing/selftests/kvm, you
only need guest_memfd_test, e.g.:
	make EXTRA_CFLAGS="-static -DDEBUG" -C tools/testing/selftests/kvm

- Build kvmtool [5] lkvm-static (I build it on a different machine).
	make lkvm-static

Run:
Boot your Linux image with the kernel you built above.

The selftest you can run as it is:
	./guest_memfd_test

For kvmtool, where bzImage is the same as the host's:
	./lkvm-static run -c 2 -m 512 -p "break=mount" --kernel bzImage --debug --guest_memfd --sw_protected

To test this patch series on arm64 (I use a standard Debian image):

Build:

- Build the kernel with defconfig

- Build the kernel kvm selftest tools/testing/selftests/kvm, you
only need guest_memfd_test.

- Build kvmtool [5] lkvm-static (I cross compile it on a different machine).
You are likely to need libfdt as well.

For libfdt (in the same directory as kvmtool):
	git clone git://git.kernel.org/pub/scm/utils/dtc/dtc.git
	cd dtc
	export CC=aarch64-linux-gnu-gcc
	make
	cd ..

Then for kvmtool:
	make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- LIBFDT_DIR=./dtc/libfdt/ lkvm-static

Run:
Boot your Linux image with the kernel you built above.

The selftest you can run as it is:
	./guest_memfd_test

For kvmtool, where Image is the same as the host's, and rootfs is
your rootfs image (in case kvmtool can't figure it out):
	./lkvm-static run -c 2 -m 512 -d rootfs --kernel Image --force-pci --irqchip gicv3 --debug --guest_memfd --sw_protected

You can find (potentially slightly outdated) instructions on how
to a full arm64 system stack under QEMU here [6].

Cheers,
/fuad

[1] https://lore.kernel.org/all/20250318161823.4005529-1-tabba@google.com/
[2] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[3] https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
[4] https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
[5] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/guestmem-basic-6.15
[6] https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/docs/qemu/qemu-arm64-howto.html

Fuad Tabba (13):
  KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
    CONFIG_KVM_GENERIC_GMEM_POPULATE
  KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
  KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
  KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
  KVM: x86: Generalize private fault lookups to guest_memfd fault
    lookups
  KVM: Fix comments that refer to slots_lock
  KVM: guest_memfd: Allow host to map guest_memfd() pages
  KVM: arm64: Refactor user_mem_abort() calculation of force_pte
  KVM: arm64: Handle guest_memfd()-backed guest page faults
  KVM: arm64: Enable mapping guest_memfd in arm64
  KVM: x86: KVM_X86_SW_PROTECTED_VM to support guest_memfd shared memory
  KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is
    allowed

 arch/arm64/include/asm/kvm_host.h             | 12 +++
 arch/arm64/kvm/Kconfig                        |  1 +
 arch/arm64/kvm/mmu.c                          | 76 +++++++++------
 arch/x86/include/asm/kvm_host.h               | 17 ++--
 arch/x86/kvm/Kconfig                          |  4 +-
 arch/x86/kvm/mmu/mmu.c                        | 31 +++---
 arch/x86/kvm/svm/sev.c                        |  4 +-
 arch/x86/kvm/svm/svm.c                        |  4 +-
 arch/x86/kvm/x86.c                            |  3 +-
 include/linux/kvm_host.h                      | 44 +++++++--
 include/uapi/linux/kvm.h                      |  1 +
 tools/testing/selftests/kvm/Makefile.kvm      |  1 +
 .../testing/selftests/kvm/guest_memfd_test.c  | 75 +++++++++++++--
 virt/kvm/Kconfig                              | 15 ++-
 virt/kvm/Makefile.kvm                         |  2 +-
 virt/kvm/guest_memfd.c                        | 96 ++++++++++++++++++-
 virt/kvm/kvm_main.c                           | 21 ++--
 virt/kvm/kvm_mm.h                             |  4 +-
 18 files changed, 316 insertions(+), 95 deletions(-)


base-commit: b4432656b36e5cc1d50a1f2dc15357543add530e
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v8 01/13] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-05-01 17:38   ` Ira Weiny
  2025-04-30 16:56 ` [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

The option KVM_PRIVATE_MEM enables guest_memfd in general. Subsequent
patches add shared memory support to guest_memfd. Therefore, rename it
to KVM_GMEM to make its purpose clearer.

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 include/linux/kvm_host.h        | 10 +++++-----
 virt/kvm/Kconfig                |  8 ++++----
 virt/kvm/Makefile.kvm           |  2 +-
 virt/kvm/kvm_main.c             |  4 ++--
 virt/kvm/kvm_mm.h               |  4 ++--
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7bc174a1f1cb..52f6f6d08558 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2253,7 +2253,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 		       int tdp_max_root_level, int tdp_huge_page_level);
 
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
 #else
 #define kvm_arch_has_private_mem(kvm) false
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 291d49b9bf05..d6900995725d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -601,7 +601,7 @@ struct kvm_memory_slot {
 	short id;
 	u16 as_id;
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	struct {
 		/*
 		 * Writes protected by kvm->slots_lock.  Acquiring a
@@ -722,7 +722,7 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
  * Arch code must define kvm_arch_has_private_mem if support for private memory
  * is enabled.
  */
-#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
+#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
 static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
 {
 	return false;
@@ -2504,7 +2504,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
-	return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
+	return IS_ENABLED(CONFIG_KVM_GMEM) &&
 	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
 }
 #else
@@ -2514,7 +2514,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 }
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
 		     int *max_order);
@@ -2527,7 +2527,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
-#endif /* CONFIG_KVM_PRIVATE_MEM */
+#endif /* CONFIG_KVM_GMEM */
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 727b542074e7..49df4e32bff7 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -112,19 +112,19 @@ config KVM_GENERIC_MEMORY_ATTRIBUTES
        depends on KVM_GENERIC_MMU_NOTIFIER
        bool
 
-config KVM_PRIVATE_MEM
+config KVM_GMEM
        select XARRAY_MULTI
        bool
 
 config KVM_GENERIC_PRIVATE_MEM
        select KVM_GENERIC_MEMORY_ATTRIBUTES
-       select KVM_PRIVATE_MEM
+       select KVM_GMEM
        bool
 
 config HAVE_KVM_ARCH_GMEM_PREPARE
        bool
-       depends on KVM_PRIVATE_MEM
+       depends on KVM_GMEM
 
 config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
-       depends on KVM_PRIVATE_MEM
+       depends on KVM_GMEM
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 724c89af78af..8d00918d4c8b 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -12,4 +12,4 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
-kvm-$(CONFIG_KVM_PRIVATE_MEM) += $(KVM)/guest_memfd.o
+kvm-$(CONFIG_KVM_GMEM) += $(KVM)/guest_memfd.o
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e85b33a92624..4996cac41a8f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4842,7 +4842,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_MEMORY_ATTRIBUTES:
 		return kvm_supported_mem_attributes(kvm);
 #endif
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_has_private_mem(kvm);
 #endif
@@ -5276,7 +5276,7 @@ static long kvm_vm_ioctl(struct file *filp,
 	case KVM_GET_STATS_FD:
 		r = kvm_vm_ioctl_get_stats_fd(kvm);
 		break;
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 	case KVM_CREATE_GUEST_MEMFD: {
 		struct kvm_create_guest_memfd guest_memfd;
 
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index acef3f5c582a..ec311c0d6718 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -67,7 +67,7 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
 }
 #endif /* HAVE_KVM_PFNCACHE */
 
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
 void kvm_gmem_init(struct module *module);
 int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args);
 int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -91,6 +91,6 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 {
 	WARN_ON_ONCE(1);
 }
-#endif /* CONFIG_KVM_PRIVATE_MEM */
+#endif /* CONFIG_KVM_GMEM */
 
 #endif /* __KVM_MM_H__ */
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
  2025-04-30 16:56 ` [PATCH v8 01/13] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-05-01 18:10   ` Ira Weiny
  2025-04-30 16:56 ` [PATCH v8 03/13] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

The option KVM_GENERIC_PRIVATE_MEM enables populating a GPA range with
guest data. Rename it to KVM_GENERIC_GMEM_POPULATE to make its purpose
clearer.

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/Kconfig     | 4 ++--
 include/linux/kvm_host.h | 2 +-
 virt/kvm/Kconfig         | 2 +-
 virt/kvm/guest_memfd.c   | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fe8ea8c097de..b37258253543 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -46,7 +46,7 @@ config KVM_X86
 	select HAVE_KVM_PM_NOTIFIER if PM
 	select KVM_GENERIC_HARDWARE_ENABLING
 	select KVM_GENERIC_PRE_FAULT_MEMORY
-	select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
+	select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
 	select KVM_WERROR if WERROR
 
 config KVM
@@ -145,7 +145,7 @@ config KVM_AMD_SEV
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select ARCH_HAS_CC_PLATFORM
-	select KVM_GENERIC_PRIVATE_MEM
+	select KVM_GENERIC_GMEM_POPULATE
 	select HAVE_KVM_ARCH_GMEM_PREPARE
 	select HAVE_KVM_ARCH_GMEM_INVALIDATE
 	help
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d6900995725d..7ca23837fa52 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2533,7 +2533,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
 #endif
 
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 /**
  * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
  *
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 49df4e32bff7..559c93ad90be 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -116,7 +116,7 @@ config KVM_GMEM
        select XARRAY_MULTI
        bool
 
-config KVM_GENERIC_PRIVATE_MEM
+config KVM_GENERIC_GMEM_POPULATE
        select KVM_GENERIC_MEMORY_ATTRIBUTES
        select KVM_GMEM
        bool
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b2aa6bf24d3a..befea51bbc75 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -638,7 +638,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque)
 {
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 03/13] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
  2025-04-30 16:56 ` [PATCH v8 01/13] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
  2025-04-30 16:56 ` [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-05-01 18:18   ` Ira Weiny
  2025-04-30 16:56 ` [PATCH v8 04/13] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

The function kvm_arch_has_private_mem() is used to indicate whether
guest_memfd is supported by the architecture, which until now implies
that its private. To decouple guest_memfd support from whether the
memory is private, rename this function to kvm_arch_supports_gmem().

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 8 ++++----
 arch/x86/kvm/mmu/mmu.c          | 8 ++++----
 include/linux/kvm_host.h        | 6 +++---
 virt/kvm/kvm_main.c             | 6 +++---
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 52f6f6d08558..4a83fbae7056 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2254,9 +2254,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 
 #ifdef CONFIG_KVM_GMEM
-#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
+#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
 #else
-#define kvm_arch_has_private_mem(kvm) false
+#define kvm_arch_supports_gmem(kvm) false
 #endif
 
 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
@@ -2309,8 +2309,8 @@ enum {
 #define HF_SMM_INSIDE_NMI_MASK	(1 << 2)
 
 # define KVM_MAX_NR_ADDRESS_SPACES	2
-/* SMM is currently unsupported for guests with private memory. */
-# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 2)
+/* SMM is currently unsupported for guests with guest_memfd (esp private) memory. */
+# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_supports_gmem(kvm) ? 1 : 2)
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
 #else
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 63bb77ee1bb1..7d654506d800 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4917,7 +4917,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
 	if (r)
 		return r;
 
-	if (kvm_arch_has_private_mem(vcpu->kvm) &&
+	if (kvm_arch_supports_gmem(vcpu->kvm) &&
 	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(range->gpa)))
 		error_code |= PFERR_PRIVATE_ACCESS;
 
@@ -7683,7 +7683,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
 	 * Zapping SPTEs in this case ensures KVM will reassess whether or not
 	 * a hugepage can be used for affected ranges.
 	 */
-	if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+	if (WARN_ON_ONCE(!kvm_arch_supports_gmem(kvm)))
 		return false;
 
 	/* Unmap the old attribute page. */
@@ -7746,7 +7746,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 	 * a range that has PRIVATE GFNs, and conversely converting a range to
 	 * SHARED may now allow hugepages.
 	 */
-	if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+	if (WARN_ON_ONCE(!kvm_arch_supports_gmem(kvm)))
 		return false;
 
 	/*
@@ -7802,7 +7802,7 @@ void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
 {
 	int level;
 
-	if (!kvm_arch_has_private_mem(kvm))
+	if (!kvm_arch_supports_gmem(kvm))
 		return;
 
 	for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7ca23837fa52..6ca7279520cf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -719,11 +719,11 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
 #endif
 
 /*
- * Arch code must define kvm_arch_has_private_mem if support for private memory
+ * Arch code must define kvm_arch_supports_gmem if support for guest_memfd
  * is enabled.
  */
-#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
-static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
+#if !defined(kvm_arch_supports_gmem) && !IS_ENABLED(CONFIG_KVM_GMEM)
+static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
 {
 	return false;
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4996cac41a8f..2468d50a9ed4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1531,7 +1531,7 @@ static int check_memory_region_flags(struct kvm *kvm,
 {
 	u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
 
-	if (kvm_arch_has_private_mem(kvm))
+	if (kvm_arch_supports_gmem(kvm))
 		valid_flags |= KVM_MEM_GUEST_MEMFD;
 
 	/* Dirty logging private memory is not currently supported. */
@@ -2362,7 +2362,7 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 static u64 kvm_supported_mem_attributes(struct kvm *kvm)
 {
-	if (!kvm || kvm_arch_has_private_mem(kvm))
+	if (!kvm || kvm_arch_supports_gmem(kvm))
 		return KVM_MEMORY_ATTRIBUTE_PRIVATE;
 
 	return 0;
@@ -4844,7 +4844,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #endif
 #ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
-		return !kvm || kvm_arch_has_private_mem(kvm);
+		return !kvm || kvm_arch_supports_gmem(kvm);
 #endif
 	default:
 		break;
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 04/13] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (2 preceding siblings ...)
  2025-04-30 16:56 ` [PATCH v8 03/13] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-05-01 18:19   ` Ira Weiny
  2025-04-30 16:56 ` [PATCH v8 05/13] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

The bool has_private_mem is used to indicate whether guest_memfd is
supported. Rename it to supports_gmem to make its meaning clearer and to
decouple memory being private from guest_memfd.

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 4 ++--
 arch/x86/kvm/mmu/mmu.c          | 2 +-
 arch/x86/kvm/svm/svm.c          | 4 ++--
 arch/x86/kvm/x86.c              | 3 +--
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a83fbae7056..709cc2a7ba66 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1331,7 +1331,7 @@ struct kvm_arch {
 	unsigned int indirect_shadow_pages;
 	u8 mmu_valid_gen;
 	u8 vm_type;
-	bool has_private_mem;
+	bool supports_gmem;
 	bool has_protected_state;
 	bool pre_fault_allowed;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
@@ -2254,7 +2254,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 
 #ifdef CONFIG_KVM_GMEM
-#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
+#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
 #else
 #define kvm_arch_supports_gmem(kvm) false
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7d654506d800..734d71ec97ef 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3486,7 +3486,7 @@ static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault *fault
 	 * on RET_PF_SPURIOUS until the update completes, or an actual spurious
 	 * case might go down the slow path. Either case will resolve itself.
 	 */
-	if (kvm->arch.has_private_mem &&
+	if (kvm->arch.supports_gmem &&
 	    fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
 		return false;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d5d0c5c3300b..b391dd6208cf 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5048,8 +5048,8 @@ static int svm_vm_init(struct kvm *kvm)
 			(type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
 		to_kvm_sev_info(kvm)->need_init = true;
 
-		kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
-		kvm->arch.pre_fault_allowed = !kvm->arch.has_private_mem;
+		kvm->arch.supports_gmem = (type == KVM_X86_SNP_VM);
+		kvm->arch.pre_fault_allowed = !kvm->arch.supports_gmem;
 	}
 
 	if (!pause_filter_count || !pause_filter_thresh)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index df5b99ea1f18..5b11ef131d5c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12716,8 +12716,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		return -EINVAL;
 
 	kvm->arch.vm_type = type;
-	kvm->arch.has_private_mem =
-		(type == KVM_X86_SW_PROTECTED_VM);
+	kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
 	/* Decided by the vendor code for other VM types.  */
 	kvm->arch.pre_fault_allowed =
 		type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 05/13] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (3 preceding siblings ...)
  2025-04-30 16:56 ` [PATCH v8 04/13] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-05-01 21:37   ` Ira Weiny
  2025-04-30 16:56 ` [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups Fuad Tabba
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

The function kvm_slot_can_be_private() is used to check whether a memory
slot is backed by guest_memfd. Rename it to kvm_slot_has_gmem() to make
that clearer and to decouple memory being private from guest_memfd.

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 4 ++--
 arch/x86/kvm/svm/sev.c   | 4 ++--
 include/linux/kvm_host.h | 2 +-
 virt/kvm/guest_memfd.c   | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 734d71ec97ef..6d5dd869c890 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3283,7 +3283,7 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	bool is_private = kvm_slot_can_be_private(slot) &&
+	bool is_private = kvm_slot_has_gmem(slot) &&
 			  kvm_mem_is_private(kvm, gfn);
 
 	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
@@ -4496,7 +4496,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 {
 	int max_order, r;
 
-	if (!kvm_slot_can_be_private(fault->slot)) {
+	if (!kvm_slot_has_gmem(fault->slot)) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 		return -EFAULT;
 	}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0bc708ee2788..fbf55821d62e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2378,7 +2378,7 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	mutex_lock(&kvm->slots_lock);
 
 	memslot = gfn_to_memslot(kvm, params.gfn_start);
-	if (!kvm_slot_can_be_private(memslot)) {
+	if (!kvm_slot_has_gmem(memslot)) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -4682,7 +4682,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 	}
 
 	slot = gfn_to_memslot(kvm, gfn);
-	if (!kvm_slot_can_be_private(slot)) {
+	if (!kvm_slot_has_gmem(slot)) {
 		pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
 				    gpa);
 		return;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6ca7279520cf..d9616ee6acc7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -614,7 +614,7 @@ struct kvm_memory_slot {
 #endif
 };
 
-static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot)
+static inline bool kvm_slot_has_gmem(const struct kvm_memory_slot *slot)
 {
 	return slot && (slot->flags & KVM_MEM_GUEST_MEMFD);
 }
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index befea51bbc75..6db515833f61 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -654,7 +654,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 		return -EINVAL;
 
 	slot = gfn_to_memslot(kvm, start_gfn);
-	if (!kvm_slot_can_be_private(slot))
+	if (!kvm_slot_has_gmem(slot))
 		return -EINVAL;
 
 	file = kvm_gmem_get_file(slot);
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (4 preceding siblings ...)
  2025-04-30 16:56 ` [PATCH v8 05/13] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-04-30 18:58   ` Ackerley Tng
  2025-05-01 21:38   ` Ira Weiny
  2025-04-30 16:56 ` [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock Fuad Tabba
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Until now, faults to private memory backed by guest_memfd are always
consumed from guest_memfd whereas faults to shared memory are consumed
from anonymous memory. Subsequent patches will allow sharing guest_memfd
backed memory in-place, and mapping it by the host. Faults to in-place
shared memory should be consumed from guest_memfd as well.

In order to facilitate that, generalize the fault lookups. Currently,
only private memory is consumed from guest_memfd and therefore as it
stands, this patch does not change the behavior.

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 19 +++++++++----------
 include/linux/kvm_host.h |  6 ++++++
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d5dd869c890..08eebd24a0e1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3258,7 +3258,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
 
 static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 				       const struct kvm_memory_slot *slot,
-				       gfn_t gfn, int max_level, bool is_private)
+				       gfn_t gfn, int max_level, bool is_gmem)
 {
 	struct kvm_lpage_info *linfo;
 	int host_level;
@@ -3270,7 +3270,7 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 			break;
 	}
 
-	if (is_private)
+	if (is_gmem)
 		return max_level;
 
 	if (max_level == PG_LEVEL_4K)
@@ -3283,10 +3283,9 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	bool is_private = kvm_slot_has_gmem(slot) &&
-			  kvm_mem_is_private(kvm, gfn);
+	bool is_gmem = kvm_slot_has_gmem(slot) && kvm_mem_from_gmem(kvm, gfn);
 
-	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
+	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_gmem);
 }
 
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
@@ -4465,7 +4464,7 @@ static inline u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
-static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
+static u8 kvm_max_gmem_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
 					u8 max_level, int gmem_order)
 {
 	u8 req_max_level;
@@ -4491,7 +4490,7 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 				 r == RET_PF_RETRY, fault->map_writable);
 }
 
-static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
+static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
 				       struct kvm_page_fault *fault)
 {
 	int max_order, r;
@@ -4509,8 +4508,8 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	}
 
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
-	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
-							 fault->max_level, max_order);
+	fault->max_level = kvm_max_gmem_mapping_level(vcpu->kvm, fault->pfn,
+						      fault->max_level, max_order);
 
 	return RET_PF_CONTINUE;
 }
@@ -4521,7 +4520,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
 	unsigned int foll = fault->write ? FOLL_WRITE : 0;
 
 	if (fault->is_private)
-		return kvm_mmu_faultin_pfn_private(vcpu, fault);
+		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
 
 	foll |= FOLL_NOWAIT;
 	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d9616ee6acc7..cdcd7ac091b5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2514,6 +2514,12 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 }
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
+static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
+{
+	/* For now, only private memory gets consumed from guest_memfd. */
+	return kvm_mem_is_private(kvm, gfn);
+}
+
 #ifdef CONFIG_KVM_GMEM
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (5 preceding siblings ...)
  2025-04-30 16:56 ` [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-04-30 21:30   ` David Hildenbrand
  2025-05-01 21:43   ` Ira Weiny
  2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Fix comments so that they refer to slots_lock instead of slots_locks
(remove trailing s).

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 2 +-
 virt/kvm/kvm_main.c      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cdcd7ac091b5..9419fb99f7c2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -859,7 +859,7 @@ struct kvm {
 	struct notifier_block pm_notifier;
 #endif
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-	/* Protected by slots_locks (for writes) and RCU (for reads) */
+	/* Protected by slots_lock (for writes) and RCU (for reads) */
 	struct xarray mem_attr_array;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2468d50a9ed4..6289ea1685dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -333,7 +333,7 @@ void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
 	 * All current use cases for flushing the TLBs for a specific memslot
 	 * are related to dirty logging, and many do the TLB flush out of
 	 * mmu_lock. The interaction between the various operations on memslot
-	 * must be serialized by slots_locks to ensure the TLB flush from one
+	 * must be serialized by slots_lock to ensure the TLB flush from one
 	 * operation is observed by any other operation on the same memslot.
 	 */
 	lockdep_assert_held(&kvm->slots_lock);
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (6 preceding siblings ...)
  2025-04-30 16:56 ` [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-04-30 21:33   ` David Hildenbrand
                     ` (4 more replies)
  2025-04-30 16:56 ` [PATCH v8 09/13] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
                   ` (4 subsequent siblings)
  12 siblings, 5 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Add support for mmap() and fault() for guest_memfd backed memory
in the host for VMs that support in-place conversion between
shared and private. To that end, this patch adds the ability to
check whether the VM type supports in-place conversion, and only
allows mapping its memory if that's the case.

This patch introduces the configuration option KVM_GMEM_SHARED_MEM,
which enables support for in-place shared memory.

It also introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
indicates that the host can create VMs that support shared memory.
Supporting shared memory implies that memory can be mapped when shared
with the host.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 15 ++++++-
 include/uapi/linux/kvm.h |  1 +
 virt/kvm/Kconfig         |  5 +++
 virt/kvm/guest_memfd.c   | 92 ++++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      |  4 ++
 5 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9419fb99f7c2..f3af6bff3232 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -729,6 +729,17 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
 }
 #endif
 
+/*
+ * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
+ * private memory is enabled and it supports in-place shared/private conversion.
+ */
+#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
+static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
+{
+	return false;
+}
+#endif
+
 #ifndef kvm_arch_has_readonly_mem
 static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
 {
@@ -2516,7 +2527,9 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 
 static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
 {
-	/* For now, only private memory gets consumed from guest_memfd. */
+	if (kvm_arch_gmem_supports_shared_mem(kvm))
+		return true;
+
 	return kvm_mem_is_private(kvm, gfn);
 }
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6ae8ad8934b..8bc8046c7f3a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -930,6 +930,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
 #define KVM_CAP_X86_GUEST_MODE 238
 #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
+#define KVM_CAP_GMEM_SHARED_MEM 240
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 559c93ad90be..f4e469a62a60 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
 config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
        depends on KVM_GMEM
+
+config KVM_GMEM_SHARED_MEM
+       select KVM_GMEM
+       bool
+       prompt "Enables in-place shared memory for guest_memfd"
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 6db515833f61..8bc8fc991d58 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -312,7 +312,99 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
 	return gfn - slot->base_gfn + slot->gmem.pgoff;
 }
 
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+/*
+ * Returns true if the folio is shared with the host and the guest.
+ */
+static bool kvm_gmem_offset_is_shared(struct file *file, pgoff_t index)
+{
+	struct kvm_gmem *gmem = file->private_data;
+
+	/* For now, VMs that support shared memory share all their memory. */
+	return kvm_arch_gmem_supports_shared_mem(gmem->kvm);
+}
+
+static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
+{
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct folio *folio;
+	vm_fault_t ret = VM_FAULT_LOCKED;
+
+	filemap_invalidate_lock_shared(inode->i_mapping);
+
+	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+	if (IS_ERR(folio)) {
+		int err = PTR_ERR(folio);
+
+		if (err == -EAGAIN)
+			ret = VM_FAULT_RETRY;
+		else
+			ret = vmf_error(err);
+
+		goto out_filemap;
+	}
+
+	if (folio_test_hwpoison(folio)) {
+		ret = VM_FAULT_HWPOISON;
+		goto out_folio;
+	}
+
+	if (!kvm_gmem_offset_is_shared(vmf->vma->vm_file, vmf->pgoff)) {
+		ret = VM_FAULT_SIGBUS;
+		goto out_folio;
+	}
+
+	if (WARN_ON_ONCE(folio_test_large(folio))) {
+		ret = VM_FAULT_SIGBUS;
+		goto out_folio;
+	}
+
+	if (!folio_test_uptodate(folio)) {
+		clear_highpage(folio_page(folio, 0));
+		kvm_gmem_mark_prepared(folio);
+	}
+
+	vmf->page = folio_file_page(folio, vmf->pgoff);
+
+out_folio:
+	if (ret != VM_FAULT_LOCKED) {
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+
+out_filemap:
+	filemap_invalidate_unlock_shared(inode->i_mapping);
+
+	return ret;
+}
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+	.fault = kvm_gmem_fault_shared,
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct kvm_gmem *gmem = file->private_data;
+
+	if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
+		return -ENODEV;
+
+	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
+	    (VM_SHARED | VM_MAYSHARE)) {
+		return -EINVAL;
+	}
+
+	vm_flags_set(vma, VM_DONTDUMP);
+	vma->vm_ops = &kvm_gmem_vm_ops;
+
+	return 0;
+}
+#else
+#define kvm_gmem_mmap NULL
+#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
+
 static struct file_operations kvm_gmem_fops = {
+	.mmap		= kvm_gmem_mmap,
 	.open		= generic_file_open,
 	.release	= kvm_gmem_release,
 	.fallocate	= kvm_gmem_fallocate,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6289ea1685dd..c75d8e188eb7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4845,6 +4845,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #ifdef CONFIG_KVM_GMEM
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_supports_gmem(kvm);
+#endif
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+	case KVM_CAP_GMEM_SHARED_MEM:
+		return !kvm || kvm_arch_gmem_supports_shared_mem(kvm);
 #endif
 	default:
 		break;
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 09/13] KVM: arm64: Refactor user_mem_abort() calculation of force_pte
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (7 preceding siblings ...)
  2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-04-30 21:35   ` David Hildenbrand
  2025-04-30 16:56 ` [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

To simplify the code and to make the assumptions clearer,
refactor user_mem_abort() by immediately setting force_pte to
true if the conditions are met. Also, remove the comment about
logging_active being guaranteed to never be true for VM_PFNMAP
memslots, since it's not actually correct.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 754f2fe0cc67..148a97c129de 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1472,7 +1472,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  bool fault_is_perm)
 {
 	int ret = 0;
-	bool write_fault, writable, force_pte = false;
+	bool write_fault, writable;
 	bool exec_fault, mte_allowed;
 	bool device = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
@@ -1484,6 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
+	bool force_pte = logging_active || is_protected_kvm_enabled();
 	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
@@ -1533,16 +1534,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 	}
 
-	/*
-	 * logging_active is guaranteed to never be true for VM_PFNMAP
-	 * memslots.
-	 */
-	if (logging_active || is_protected_kvm_enabled()) {
-		force_pte = true;
+	if (force_pte)
 		vma_shift = PAGE_SHIFT;
-	} else {
+	else
 		vma_shift = get_vma_page_shift(vma, hva);
-	}
 
 	switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (8 preceding siblings ...)
  2025-04-30 16:56 ` [PATCH v8 09/13] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-05-09 20:15   ` James Houghton
  2025-04-30 16:56 ` [PATCH v8 11/13] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Add arm64 support for handling guest page faults on guest_memfd
backed memslots.

For now, the fault granule is restricted to PAGE_SIZE.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c     | 65 +++++++++++++++++++++++++++-------------
 include/linux/kvm_host.h |  5 ++++
 virt/kvm/kvm_main.c      |  5 ----
 3 files changed, 50 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 148a97c129de..d1044c7f78bb 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1466,6 +1466,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
+static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			     gfn_t gfn, bool write_fault, bool *writable,
+			     struct page **page, bool is_gmem)
+{
+	kvm_pfn_t pfn;
+	int ret;
+
+	if (!is_gmem)
+		return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
+
+	*writable = false;
+
+	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
+	if (!ret) {
+		*writable = !memslot_is_readonly(slot);
+		return pfn;
+	}
+
+	if (ret == -EHWPOISON)
+		return KVM_PFN_ERR_HWPOISON;
+
+	return KVM_PFN_ERR_NOSLOT_MASK;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
@@ -1473,19 +1497,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 {
 	int ret = 0;
 	bool write_fault, writable;
-	bool exec_fault, mte_allowed;
+	bool exec_fault, mte_allowed = false;
 	bool device = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
 	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
-	struct vm_area_struct *vma;
+	struct vm_area_struct *vma = NULL;
 	short vma_shift;
 	void *memcache;
-	gfn_t gfn;
+	gfn_t gfn = ipa >> PAGE_SHIFT;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	bool force_pte = logging_active || is_protected_kvm_enabled();
-	long vma_pagesize, fault_granule;
+	bool is_gmem = kvm_slot_has_gmem(memslot) && kvm_mem_from_gmem(kvm, gfn);
+	bool force_pte = logging_active || is_gmem || is_protected_kvm_enabled();
+	long vma_pagesize, fault_granule = PAGE_SIZE;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 	struct page *page;
@@ -1522,16 +1547,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			return ret;
 	}
 
+	mmap_read_lock(current->mm);
+
 	/*
 	 * Let's check if we will get back a huge page backed by hugetlbfs, or
 	 * get block mapping for device MMIO region.
 	 */
-	mmap_read_lock(current->mm);
-	vma = vma_lookup(current->mm, hva);
-	if (unlikely(!vma)) {
-		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
-		mmap_read_unlock(current->mm);
-		return -EFAULT;
+	if (!is_gmem) {
+		vma = vma_lookup(current->mm, hva);
+		if (unlikely(!vma)) {
+			kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
+			mmap_read_unlock(current->mm);
+			return -EFAULT;
+		}
+
+		vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
+		mte_allowed = kvm_vma_mte_allowed(vma);
 	}
 
 	if (force_pte)
@@ -1602,18 +1633,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ipa &= ~(vma_pagesize - 1);
 	}
 
-	gfn = ipa >> PAGE_SHIFT;
-	mte_allowed = kvm_vma_mte_allowed(vma);
-
-	vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
-
 	/* Don't use the VMA after the unlock -- it may have vanished */
 	vma = NULL;
 
 	/*
 	 * Read mmu_invalidate_seq so that KVM can detect if the results of
-	 * vma_lookup() or __kvm_faultin_pfn() become stale prior to
-	 * acquiring kvm->mmu_lock.
+	 * vma_lookup() or faultin_pfn() become stale prior to acquiring
+	 * kvm->mmu_lock.
 	 *
 	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
 	 * with the smp_wmb() in kvm_mmu_invalidate_end().
@@ -1621,8 +1647,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	mmap_read_unlock(current->mm);
 
-	pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
-				&writable, &page);
+	pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3af6bff3232..1b2e4e9a7802 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1882,6 +1882,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
 	return gfn_to_memslot(kvm, gfn)->id;
 }
 
+static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot)
+{
+	return slot->flags & KVM_MEM_READONLY;
+}
+
 static inline gfn_t
 hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c75d8e188eb7..d9bca5ba19dc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2640,11 +2640,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 	return size;
 }
 
-static bool memslot_is_readonly(const struct kvm_memory_slot *slot)
-{
-	return slot->flags & KVM_MEM_READONLY;
-}
-
 static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn,
 				       gfn_t *nr_pages, bool write)
 {
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 11/13] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (9 preceding siblings ...)
  2025-04-30 16:56 ` [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-05-09 21:08   ` James Houghton
  2025-04-30 16:56 ` [PATCH v8 12/13] KVM: x86: KVM_X86_SW_PROTECTED_VM to support guest_memfd shared memory Fuad Tabba
  2025-04-30 16:56 ` [PATCH v8 13/13] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
  12 siblings, 1 reply; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Enable mapping guest_memfd in arm64. For now, it applies to all
VMs in arm64 that use guest_memfd. In the future, new VM types
can restrict this via kvm_arch_gmem_supports_shared_mem().

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
 arch/arm64/kvm/Kconfig            |  1 +
 2 files changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 08ba91e6fb03..1b1753e8021a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1593,4 +1593,16 @@ static inline bool kvm_arch_has_irq_bypass(void)
 	return true;
 }
 
+#ifdef CONFIG_KVM_GMEM
+static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
+{
+	return IS_ENABLED(CONFIG_KVM_GMEM);
+}
+
+static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
+{
+	return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
+}
+#endif /* CONFIG_KVM_GMEM */
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 096e45acadb2..8c1e1964b46a 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -38,6 +38,7 @@ menuconfig KVM
 	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
+	select KVM_GMEM_SHARED_MEM
 	help
 	  Support hosting virtualized guest machines.
 
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 12/13] KVM: x86: KVM_X86_SW_PROTECTED_VM to support guest_memfd shared memory
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (10 preceding siblings ...)
  2025-04-30 16:56 ` [PATCH v8 11/13] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  2025-04-30 16:56 ` [PATCH v8 13/13] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
  12 siblings, 0 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

The KVM_X86_SW_PROTECTED_VM type is meant for experimentation and does
not have underlying support for protected guests. This makes it a good
candidate for testing mapping shared memory. Therefore, only when the
kconfig option for in-place shared memory is enabled
(KVM_GMEM_SHARED_MEM), mark this type as supporting shared memory.

This means that this memory is considered by guest_memfd to be shared
with the host, which is now able to map and fault in guest_memfd memory
belonging to this VM type.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 709cc2a7ba66..1858dde449c3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2255,8 +2255,13 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 #ifdef CONFIG_KVM_GMEM
 #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
+
+#define kvm_arch_gmem_supports_shared_mem(kvm)         \
+	(IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&      \
+	 ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM))
 #else
 #define kvm_arch_supports_gmem(kvm) false
+#define kvm_arch_gmem_supports_shared_mem(kvm) false
 #endif
 
 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v8 13/13] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed
  2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (11 preceding siblings ...)
  2025-04-30 16:56 ` [PATCH v8 12/13] KVM: x86: KVM_X86_SW_PROTECTED_VM to support guest_memfd shared memory Fuad Tabba
@ 2025-04-30 16:56 ` Fuad Tabba
  12 siblings, 0 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-04-30 16:56 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Expand the guest_memfd selftests to include testing mapping guest
memory for VM types that support it.

Also, build the guest_memfd selftest for arm64.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |  1 +
 .../testing/selftests/kvm/guest_memfd_test.c  | 75 +++++++++++++++++--
 2 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index f62b0a5aba35..ccf95ed037c3 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -163,6 +163,7 @@ TEST_GEN_PROGS_arm64 += access_tracking_perf_test
 TEST_GEN_PROGS_arm64 += arch_timer
 TEST_GEN_PROGS_arm64 += coalesced_io_test
 TEST_GEN_PROGS_arm64 += dirty_log_perf_test
+TEST_GEN_PROGS_arm64 += guest_memfd_test
 TEST_GEN_PROGS_arm64 += get-reg-list
 TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
 TEST_GEN_PROGS_arm64 += memslot_perf_test
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index ce687f8d248f..bd35b56c90dc 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -34,12 +34,48 @@ static void test_file_read_write(int fd)
 		    "pwrite on a guest_mem fd should fail");
 }
 
-static void test_mmap(int fd, size_t page_size)
+static void test_mmap_allowed(int fd, size_t total_size)
 {
+	size_t page_size = getpagesize();
+	const char val = 0xaa;
+	char *mem;
+	size_t i;
+	int ret;
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass.");
+
+	memset(mem, val, total_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
+			page_size);
+	TEST_ASSERT(!ret, "fallocate the first page should succeed");
+
+	for (i = 0; i < page_size; i++)
+		TEST_ASSERT_EQ(mem[i], 0x00);
+	for (; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	memset(mem, val, total_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	ret = munmap(mem, total_size);
+	TEST_ASSERT(!ret, "munmap should succeed");
+}
+
+static void test_mmap_denied(int fd, size_t total_size)
+{
+	size_t page_size = getpagesize();
 	char *mem;
 
 	mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 	TEST_ASSERT_EQ(mem, MAP_FAILED);
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT_EQ(mem, MAP_FAILED);
 }
 
 static void test_file_size(int fd, size_t page_size, size_t total_size)
@@ -170,19 +206,27 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 	close(fd1);
 }
 
-int main(int argc, char *argv[])
+unsigned long get_shared_type(void)
 {
-	size_t page_size;
+#ifdef __x86_64__
+	return KVM_X86_SW_PROTECTED_VM;
+#endif
+	return 0;
+}
+
+void test_vm_type(unsigned long type, bool is_shared)
+{
+	struct kvm_vm *vm;
 	size_t total_size;
+	size_t page_size;
 	int fd;
-	struct kvm_vm *vm;
 
 	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
 
 	page_size = getpagesize();
 	total_size = page_size * 4;
 
-	vm = vm_create_barebones();
+	vm = vm_create_barebones_type(type);
 
 	test_create_guest_memfd_invalid(vm);
 	test_create_guest_memfd_multiple(vm);
@@ -190,10 +234,29 @@ int main(int argc, char *argv[])
 	fd = vm_create_guest_memfd(vm, total_size, 0);
 
 	test_file_read_write(fd);
-	test_mmap(fd, page_size);
+
+	if (is_shared)
+		test_mmap_allowed(fd, total_size);
+	else
+		test_mmap_denied(fd, total_size);
+
 	test_file_size(fd, page_size, total_size);
 	test_fallocate(fd, page_size, total_size);
 	test_invalid_punch_hole(fd, page_size, total_size);
 
 	close(fd);
+	kvm_vm_release(vm);
+}
+
+int main(int argc, char *argv[])
+{
+#ifndef __aarch64__
+	/* For now, arm64 only supports shared guest memory. */
+	test_vm_type(VM_TYPE_DEFAULT, false);
+#endif
+
+	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
+		test_vm_type(get_shared_type(), true);
+
+	return 0;
 }
-- 
2.49.0.901.g37484f566f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-04-30 16:56 ` [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups Fuad Tabba
@ 2025-04-30 18:58   ` Ackerley Tng
  2025-05-01  9:53     ` Fuad Tabba
  2025-05-02 15:04     ` David Hildenbrand
  2025-05-01 21:38   ` Ira Weiny
  1 sibling, 2 replies; 63+ messages in thread
From: Ackerley Tng @ 2025-04-30 18:58 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta, tabba

Fuad Tabba <tabba@google.com> writes:

> Until now, faults to private memory backed by guest_memfd are always
> consumed from guest_memfd whereas faults to shared memory are consumed
> from anonymous memory. Subsequent patches will allow sharing guest_memfd
> backed memory in-place, and mapping it by the host. Faults to in-place
> shared memory should be consumed from guest_memfd as well.
>
> In order to facilitate that, generalize the fault lookups. Currently,
> only private memory is consumed from guest_memfd and therefore as it
> stands, this patch does not change the behavior.
>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c   | 19 +++++++++----------
>  include/linux/kvm_host.h |  6 ++++++
>  2 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 6d5dd869c890..08eebd24a0e1 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3258,7 +3258,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
>
>  static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
>  				       const struct kvm_memory_slot *slot,
> -				       gfn_t gfn, int max_level, bool is_private)
> +				       gfn_t gfn, int max_level, bool is_gmem)
>  {
>  	struct kvm_lpage_info *linfo;
>  	int host_level;
> @@ -3270,7 +3270,7 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
>  			break;
>  	}
>
> -	if (is_private)
> +	if (is_gmem)
>  		return max_level;

I think this renaming isn't quite accurate.

IIUC in __kvm_mmu_max_mapping_level(), we skip considering
host_pfn_mapping_level() if the gfn is private because private memory
will not be mapped to userspace, so there's no need to query userspace
page tables in host_pfn_mapping_level().

Renaming is_private to is_gmem in this function implies that as long as
gmem is used, especially for shared pages from gmem, lpage_info will
always be updated and there's no need to query userspace page tables.

>
>  	if (max_level == PG_LEVEL_4K)
> @@ -3283,10 +3283,9 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
>  int kvm_mmu_max_mapping_level(struct kvm *kvm,
>  			      const struct kvm_memory_slot *slot, gfn_t gfn)
>  {
> -	bool is_private = kvm_slot_has_gmem(slot) &&
> -			  kvm_mem_is_private(kvm, gfn);
> +	bool is_gmem = kvm_slot_has_gmem(slot) && kvm_mem_from_gmem(kvm, gfn);

This renaming should probably be undone too.

>
> -	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
> +	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_gmem);
>  }
>
>  void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> @@ -4465,7 +4464,7 @@ static inline u8 kvm_max_level_for_order(int order)
>  	return PG_LEVEL_4K;
>  }
>
> -static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
> +static u8 kvm_max_gmem_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
>  					u8 max_level, int gmem_order)
>  {
>  	u8 req_max_level;
> @@ -4491,7 +4490,7 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
>  				 r == RET_PF_RETRY, fault->map_writable);
>  }
>
> -static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> +static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
>  				       struct kvm_page_fault *fault)
>  {
>  	int max_order, r;
> @@ -4509,8 +4508,8 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
>  	}
>
>  	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> -	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
> -							 fault->max_level, max_order);
> +	fault->max_level = kvm_max_gmem_mapping_level(vcpu->kvm, fault->pfn,
> +						      fault->max_level, max_order);
>
>  	return RET_PF_CONTINUE;
>  }
> @@ -4521,7 +4520,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>  	unsigned int foll = fault->write ? FOLL_WRITE : 0;
>
>  	if (fault->is_private)
> -		return kvm_mmu_faultin_pfn_private(vcpu, fault);
> +		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>
>  	foll |= FOLL_NOWAIT;
>  	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d9616ee6acc7..cdcd7ac091b5 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2514,6 +2514,12 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  }
>  #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
>
> +static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
> +{
> +	/* For now, only private memory gets consumed from guest_memfd. */
> +	return kvm_mem_is_private(kvm, gfn);
> +}

Can I understand this function as "should fault from gmem"? And hence
also "was faulted from gmem"?

After this entire patch series, for arm64, KVM will always service stage
2 faults from gmem.

Perhaps this function should retain your suggested name of
kvm_mem_from_gmem() but only depend on
kvm_arch_gmem_supports_shared_mem(), since this patch series doesn't
update the MMU in X86. So something like this,

+static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
+{
+	return kvm_arch_gmem_supports_shared_mem(kvm);
+}

with the only usage in arm64.

When the MMU code for X86 is updated, we could then update the above
with 

static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
{
-	return kvm_arch_gmem_supports_shared_mem(kvm);
+	return kvm_arch_gmem_supports_shared_mem(kvm) ||
+              kvm_gmem_should_always_use_gmem(gfn_to_memslot(kvm, gfn)->gmem.file) ||
+              kvm_mem_is_private(kvm, gfn);
}

where kvm_gmem_should_always_use_gmem() will read a guest_memfd flag? 

> +
>  #ifdef CONFIG_KVM_GMEM
>  int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>  		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock
  2025-04-30 16:56 ` [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock Fuad Tabba
@ 2025-04-30 21:30   ` David Hildenbrand
  2025-05-01 21:43   ` Ira Weiny
  1 sibling, 0 replies; 63+ messages in thread
From: David Hildenbrand @ 2025-04-30 21:30 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta

On 30.04.25 18:56, Fuad Tabba wrote:
> Fix comments so that they refer to slots_lock instead of slots_locks
> (remove trailing s).
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
@ 2025-04-30 21:33   ` David Hildenbrand
  2025-05-01  8:07     ` Fuad Tabba
  2025-05-02 15:11   ` David Hildenbrand
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 63+ messages in thread
From: David Hildenbrand @ 2025-04-30 21:33 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta

On 30.04.25 18:56, Fuad Tabba wrote:
> Add support for mmap() and fault() for guest_memfd backed memory
> in the host for VMs that support in-place conversion between
> shared and private. To that end, this patch adds the ability to
> check whether the VM type supports in-place conversion, and only
> allows mapping its memory if that's the case.
> 
> This patch introduces the configuration option KVM_GMEM_SHARED_MEM,
> which enables support for in-place shared memory.
> 
> It also introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> indicates that the host can create VMs that support shared memory.
> Supporting shared memory implies that memory can be mapped when shared
> with the host.

I think you should clarify here that it's not about "supports in-place 
conversion" in the context of this series.

It's about mapping shared pages only; initially, we'll introduce the 
option to only have shared memory in guest memfd, and later we'll 
introduce the option for in-place conversion.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 09/13] KVM: arm64: Refactor user_mem_abort() calculation of force_pte
  2025-04-30 16:56 ` [PATCH v8 09/13] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
@ 2025-04-30 21:35   ` David Hildenbrand
  0 siblings, 0 replies; 63+ messages in thread
From: David Hildenbrand @ 2025-04-30 21:35 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta

On 30.04.25 18:56, Fuad Tabba wrote:
> To simplify the code and to make the assumptions clearer,
> refactor user_mem_abort() by immediately setting force_pte to
> true if the conditions are met. Also, remove the comment about
> logging_active being guaranteed to never be true for VM_PFNMAP
> memslots, since it's not actually correct.
> 
> No functional change intended.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-04-30 21:33   ` David Hildenbrand
@ 2025-05-01  8:07     ` Fuad Tabba
  0 siblings, 0 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-05-01  8:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta

On Wed, 30 Apr 2025 at 22:33, David Hildenbrand <david@redhat.com> wrote:
>
> On 30.04.25 18:56, Fuad Tabba wrote:
> > Add support for mmap() and fault() for guest_memfd backed memory
> > in the host for VMs that support in-place conversion between
> > shared and private. To that end, this patch adds the ability to
> > check whether the VM type supports in-place conversion, and only
> > allows mapping its memory if that's the case.
> >
> > This patch introduces the configuration option KVM_GMEM_SHARED_MEM,
> > which enables support for in-place shared memory.
> >
> > It also introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> > indicates that the host can create VMs that support shared memory.
> > Supporting shared memory implies that memory can be mapped when shared
> > with the host.
>
> I think you should clarify here that it's not about "supports in-place
> conversion" in the context of this series.
>
> It's about mapping shared pages only; initially, we'll introduce the
> option to only have shared memory in guest memfd, and later we'll
> introduce the option for in-place conversion.

That's right. I'll fix this.

Thanks,
/fuad

> --
> Cheers,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-04-30 18:58   ` Ackerley Tng
@ 2025-05-01  9:53     ` Fuad Tabba
  2025-05-02 15:04     ` David Hildenbrand
  1 sibling, 0 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-05-01  9:53 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta

Hi Ackerley,

On Wed, 30 Apr 2025 at 19:58, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> > Until now, faults to private memory backed by guest_memfd are always
> > consumed from guest_memfd whereas faults to shared memory are consumed
> > from anonymous memory. Subsequent patches will allow sharing guest_memfd
> > backed memory in-place, and mapping it by the host. Faults to in-place
> > shared memory should be consumed from guest_memfd as well.
> >
> > In order to facilitate that, generalize the fault lookups. Currently,
> > only private memory is consumed from guest_memfd and therefore as it
> > stands, this patch does not change the behavior.
> >
> > Co-developed-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c   | 19 +++++++++----------
> >  include/linux/kvm_host.h |  6 ++++++
> >  2 files changed, 15 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 6d5dd869c890..08eebd24a0e1 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3258,7 +3258,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
> >
> >  static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
> >                                      const struct kvm_memory_slot *slot,
> > -                                    gfn_t gfn, int max_level, bool is_private)
> > +                                    gfn_t gfn, int max_level, bool is_gmem)
> >  {
> >       struct kvm_lpage_info *linfo;
> >       int host_level;
> > @@ -3270,7 +3270,7 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
> >                       break;
> >       }
> >
> > -     if (is_private)
> > +     if (is_gmem)
> >               return max_level;
>
> I think this renaming isn't quite accurate.
>
> IIUC in __kvm_mmu_max_mapping_level(), we skip considering
> host_pfn_mapping_level() if the gfn is private because private memory
> will not be mapped to userspace, so there's no need to query userspace
> page tables in host_pfn_mapping_level().
>
> Renaming is_private to is_gmem in this function implies that as long as
> gmem is used, especially for shared pages from gmem, lpage_info will
> always be updated and there's no need to query userspace page tables.
>

I understand.

> >
> >       if (max_level == PG_LEVEL_4K)
> > @@ -3283,10 +3283,9 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
> >  int kvm_mmu_max_mapping_level(struct kvm *kvm,
> >                             const struct kvm_memory_slot *slot, gfn_t gfn)
> >  {
> > -     bool is_private = kvm_slot_has_gmem(slot) &&
> > -                       kvm_mem_is_private(kvm, gfn);
> > +     bool is_gmem = kvm_slot_has_gmem(slot) && kvm_mem_from_gmem(kvm, gfn);
>
> This renaming should probably be undone too.

Ack.

> >
> > -     return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
> > +     return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_gmem);
> >  }
> >
> >  void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> > @@ -4465,7 +4464,7 @@ static inline u8 kvm_max_level_for_order(int order)
> >       return PG_LEVEL_4K;
> >  }
> >
> > -static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
> > +static u8 kvm_max_gmem_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
> >                                       u8 max_level, int gmem_order)
> >  {
> >       u8 req_max_level;
> > @@ -4491,7 +4490,7 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
> >                                r == RET_PF_RETRY, fault->map_writable);
> >  }
> >
> > -static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > +static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> >                                      struct kvm_page_fault *fault)
> >  {
> >       int max_order, r;
> > @@ -4509,8 +4508,8 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> >       }
> >
> >       fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> > -     fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
> > -                                                      fault->max_level, max_order);
> > +     fault->max_level = kvm_max_gmem_mapping_level(vcpu->kvm, fault->pfn,
> > +                                                   fault->max_level, max_order);
> >
> >       return RET_PF_CONTINUE;
> >  }
> > @@ -4521,7 +4520,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> >       unsigned int foll = fault->write ? FOLL_WRITE : 0;
> >
> >       if (fault->is_private)
> > -             return kvm_mmu_faultin_pfn_private(vcpu, fault);
> > +             return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
> >
> >       foll |= FOLL_NOWAIT;
> >       fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index d9616ee6acc7..cdcd7ac091b5 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2514,6 +2514,12 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >  }
> >  #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
> >
> > +static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
> > +{
> > +     /* For now, only private memory gets consumed from guest_memfd. */
> > +     return kvm_mem_is_private(kvm, gfn);
> > +}
>
> Can I understand this function as "should fault from gmem"? And hence
> also "was faulted from gmem"?
>
> After this entire patch series, for arm64, KVM will always service stage
> 2 faults from gmem.
>
> Perhaps this function should retain your suggested name of
> kvm_mem_from_gmem() but only depend on
> kvm_arch_gmem_supports_shared_mem(), since this patch series doesn't
> update the MMU in X86. So something like this,

Ack.

> +static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
> +{
> +       return kvm_arch_gmem_supports_shared_mem(kvm);
> +}
>
> with the only usage in arm64.
>
> When the MMU code for X86 is updated, we could then update the above
> with
>
> static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
> {
> -       return kvm_arch_gmem_supports_shared_mem(kvm);
> +       return kvm_arch_gmem_supports_shared_mem(kvm) ||
> +              kvm_gmem_should_always_use_gmem(gfn_to_memslot(kvm, gfn)->gmem.file) ||
> +              kvm_mem_is_private(kvm, gfn);
> }
>
> where kvm_gmem_should_always_use_gmem() will read a guest_memfd flag?

I'm not sure I follow this one... Could you please explain what you
mean a bit more?

Thanks,
/fuad

> > +
> >  #ifdef CONFIG_KVM_GMEM
> >  int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >                    gfn_t gfn, kvm_pfn_t *pfn, struct page **page,

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 01/13] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
  2025-04-30 16:56 ` [PATCH v8 01/13] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
@ 2025-05-01 17:38   ` Ira Weiny
  0 siblings, 0 replies; 63+ messages in thread
From: Ira Weiny @ 2025-05-01 17:38 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Fuad Tabba wrote:
> The option KVM_PRIVATE_MEM enables guest_memfd in general. Subsequent
> patches add shared memory support to guest_memfd. Therefore, rename it
> to KVM_GMEM to make its purpose clearer.
> 
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-04-30 16:56 ` [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
@ 2025-05-01 18:10   ` Ira Weiny
  2025-05-02  6:44     ` David Hildenbrand
  0 siblings, 1 reply; 63+ messages in thread
From: Ira Weiny @ 2025-05-01 18:10 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Fuad Tabba wrote:
> The option KVM_GENERIC_PRIVATE_MEM enables populating a GPA range with
> guest data. Rename it to KVM_GENERIC_GMEM_POPULATE to make its purpose
> clearer.

I'm curious what generic means in this name?

FWICT if we are going to change the name, KVM_GMEM_POPULATE is a better
name.

Ira

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 03/13] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
  2025-04-30 16:56 ` [PATCH v8 03/13] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
@ 2025-05-01 18:18   ` Ira Weiny
  0 siblings, 0 replies; 63+ messages in thread
From: Ira Weiny @ 2025-05-01 18:18 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Fuad Tabba wrote:
> The function kvm_arch_has_private_mem() is used to indicate whether
> guest_memfd is supported by the architecture, which until now implies
> that its private. To decouple guest_memfd support from whether the
> memory is private, rename this function to kvm_arch_supports_gmem().
> 
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 04/13] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
  2025-04-30 16:56 ` [PATCH v8 04/13] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
@ 2025-05-01 18:19   ` Ira Weiny
  0 siblings, 0 replies; 63+ messages in thread
From: Ira Weiny @ 2025-05-01 18:19 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Fuad Tabba wrote:
> The bool has_private_mem is used to indicate whether guest_memfd is
> supported. Rename it to supports_gmem to make its meaning clearer and to
> decouple memory being private from guest_memfd.
> 
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 05/13] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
  2025-04-30 16:56 ` [PATCH v8 05/13] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
@ 2025-05-01 21:37   ` Ira Weiny
  0 siblings, 0 replies; 63+ messages in thread
From: Ira Weiny @ 2025-05-01 21:37 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Fuad Tabba wrote:
> The function kvm_slot_can_be_private() is used to check whether a memory
> slot is backed by guest_memfd. Rename it to kvm_slot_has_gmem() to make
> that clearer and to decouple memory being private from guest_memfd.
> 
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-04-30 16:56 ` [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups Fuad Tabba
  2025-04-30 18:58   ` Ackerley Tng
@ 2025-05-01 21:38   ` Ira Weiny
  1 sibling, 0 replies; 63+ messages in thread
From: Ira Weiny @ 2025-05-01 21:38 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Fuad Tabba wrote:
> Until now, faults to private memory backed by guest_memfd are always
> consumed from guest_memfd whereas faults to shared memory are consumed
> from anonymous memory. Subsequent patches will allow sharing guest_memfd
> backed memory in-place, and mapping it by the host. Faults to in-place
> shared memory should be consumed from guest_memfd as well.
> 
> In order to facilitate that, generalize the fault lookups. Currently,
> only private memory is consumed from guest_memfd and therefore as it
> stands, this patch does not change the behavior.
> 
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock
  2025-04-30 16:56 ` [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock Fuad Tabba
  2025-04-30 21:30   ` David Hildenbrand
@ 2025-05-01 21:43   ` Ira Weiny
  2025-05-02 12:07     ` Fuad Tabba
  1 sibling, 1 reply; 63+ messages in thread
From: Ira Weiny @ 2025-05-01 21:43 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Fuad Tabba wrote:
> Fix comments so that they refer to slots_lock instead of slots_locks
> (remove trailing s).
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-05-01 18:10   ` Ira Weiny
@ 2025-05-02  6:44     ` David Hildenbrand
  2025-05-02 14:24       ` Ira Weiny
  0 siblings, 1 reply; 63+ messages in thread
From: David Hildenbrand @ 2025-05-02  6:44 UTC (permalink / raw)
  To: Ira Weiny, Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta

On 01.05.25 20:10, Ira Weiny wrote:
> Fuad Tabba wrote:
>> The option KVM_GENERIC_PRIVATE_MEM enables populating a GPA range with
>> guest data. Rename it to KVM_GENERIC_GMEM_POPULATE to make its purpose
>> clearer.
> 
> I'm curious what generic means in this name?

That an architecture wants to use the generic version and not provide 
it's own alternative implementation.

We frequently use that term in this context, see GENERIC_IOREMAP as one 
example.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock
  2025-05-01 21:43   ` Ira Weiny
@ 2025-05-02 12:07     ` Fuad Tabba
  0 siblings, 0 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-05-02 12:07 UTC (permalink / raw)
  To: Ira Weiny
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta

On Thu, 1 May 2025 at 22:43, Ira Weiny <ira.weiny@intel.com> wrote:
>
> Fuad Tabba wrote:
> > Fix comments so that they refer to slots_lock instead of slots_locks
> > (remove trailing s).
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>

Thank you for the reviews!
/fuad

> [snip]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
  2025-05-02  6:44     ` David Hildenbrand
@ 2025-05-02 14:24       ` Ira Weiny
  0 siblings, 0 replies; 63+ messages in thread
From: Ira Weiny @ 2025-05-02 14:24 UTC (permalink / raw)
  To: David Hildenbrand, Ira Weiny, Fuad Tabba, kvm, linux-arm-msm,
	linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta

David Hildenbrand wrote:
> On 01.05.25 20:10, Ira Weiny wrote:
> > Fuad Tabba wrote:
> >> The option KVM_GENERIC_PRIVATE_MEM enables populating a GPA range with
> >> guest data. Rename it to KVM_GENERIC_GMEM_POPULATE to make its purpose
> >> clearer.
> > 
> > I'm curious what generic means in this name?
> 
> That an architecture wants to use the generic version and not provide 
> it's own alternative implementation.
> 
> We frequently use that term in this context, see GENERIC_IOREMAP as one 
> example.

Ah ok.  Thanks.

Ira

> 
> -- 
> Cheers,
> 
> David / dhildenb
> 



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-04-30 18:58   ` Ackerley Tng
  2025-05-01  9:53     ` Fuad Tabba
@ 2025-05-02 15:04     ` David Hildenbrand
  2025-05-02 16:21       ` Sean Christopherson
  1 sibling, 1 reply; 63+ messages in thread
From: David Hildenbrand @ 2025-05-02 15:04 UTC (permalink / raw)
  To: Ackerley Tng, Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta

On 30.04.25 20:58, Ackerley Tng wrote:
> Fuad Tabba <tabba@google.com> writes:
> 
>> Until now, faults to private memory backed by guest_memfd are always
>> consumed from guest_memfd whereas faults to shared memory are consumed
>> from anonymous memory. Subsequent patches will allow sharing guest_memfd
>> backed memory in-place, and mapping it by the host. Faults to in-place
>> shared memory should be consumed from guest_memfd as well.
>>
>> In order to facilitate that, generalize the fault lookups. Currently,
>> only private memory is consumed from guest_memfd and therefore as it
>> stands, this patch does not change the behavior.
>>
>> Co-developed-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Fuad Tabba <tabba@google.com>
>> ---
>>   arch/x86/kvm/mmu/mmu.c   | 19 +++++++++----------
>>   include/linux/kvm_host.h |  6 ++++++
>>   2 files changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 6d5dd869c890..08eebd24a0e1 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -3258,7 +3258,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
>>
>>   static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
>>   				       const struct kvm_memory_slot *slot,
>> -				       gfn_t gfn, int max_level, bool is_private)
>> +				       gfn_t gfn, int max_level, bool is_gmem)
>>   {
>>   	struct kvm_lpage_info *linfo;
>>   	int host_level;
>> @@ -3270,7 +3270,7 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
>>   			break;
>>   	}
>>
>> -	if (is_private)
>> +	if (is_gmem)
>>   		return max_level;
> 
 > I think this renaming isn't quite accurate.

After our discussion yesterday, does that still hold true?

> 
> IIUC in __kvm_mmu_max_mapping_level(), we skip considering
> host_pfn_mapping_level() if the gfn is private because private memory
> will not be mapped to userspace, so there's no need to query userspace
> page tables in host_pfn_mapping_level().

I think the reason was that: for private we won't be walking the user 
space pages tables.

Once guest_memfd is also responsible for the shared part, why should 
this here still be private-only, and why should we consider querying a 
user space mapping that might not even exist?

> 
> Renaming is_private to is_gmem in this function implies that as long as
> gmem is used, especially for shared pages from gmem, lpage_info will
> always be updated and there's no need to query userspace page tables.

I'm missing the point why shared memory from gmem should be treated 
differently here. Can you maybe clarify which issue you see?

> 
>>
>>   	if (max_level == PG_LEVEL_4K)
>> @@ -3283,10 +3283,9 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
>>   int kvm_mmu_max_mapping_level(struct kvm *kvm,
>>   			      const struct kvm_memory_slot *slot, gfn_t gfn)
>>   {
>> -	bool is_private = kvm_slot_has_gmem(slot) &&
>> -			  kvm_mem_is_private(kvm, gfn);
>> +	bool is_gmem = kvm_slot_has_gmem(slot) && kvm_mem_from_gmem(kvm, gfn);
> 
> This renaming should probably be undone too.
> 
>>
>> -	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
>> +	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_gmem);
>>   }


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
  2025-04-30 21:33   ` David Hildenbrand
@ 2025-05-02 15:11   ` David Hildenbrand
  2025-05-02 22:06     ` Ackerley Tng
  2025-05-02 22:29   ` Ackerley Tng
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 63+ messages in thread
From: David Hildenbrand @ 2025-05-02 15:11 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta

On 30.04.25 18:56, Fuad Tabba wrote:
> Add support for mmap() and fault() for guest_memfd backed memory
> in the host for VMs that support in-place conversion between
> shared and private. To that end, this patch adds the ability to
> check whether the VM type supports in-place conversion, and only
> allows mapping its memory if that's the case.
> 
> This patch introduces the configuration option KVM_GMEM_SHARED_MEM,
> which enables support for in-place shared memory.
> 
> It also introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> indicates that the host can create VMs that support shared memory.
> Supporting shared memory implies that memory can be mapped when shared
> with the host.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>   include/linux/kvm_host.h | 15 ++++++-
>   include/uapi/linux/kvm.h |  1 +
>   virt/kvm/Kconfig         |  5 +++
>   virt/kvm/guest_memfd.c   | 92 ++++++++++++++++++++++++++++++++++++++++
>   virt/kvm/kvm_main.c      |  4 ++
>   5 files changed, 116 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9419fb99f7c2..f3af6bff3232 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -729,6 +729,17 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>   }
>   #endif
>   
> +/*
> + * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
> + * private memory is enabled and it supports in-place shared/private conversion.
> + */
> +#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
> +static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
> +{
> +	return false;
> +}
> +#endif
> +
>   #ifndef kvm_arch_has_readonly_mem
>   static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
>   {
> @@ -2516,7 +2527,9 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>   
>   static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
>   {
> -	/* For now, only private memory gets consumed from guest_memfd. */
> +	if (kvm_arch_gmem_supports_shared_mem(kvm))
> +		return true;

After our discussion yesterday, am I correct that we will not be 
querying the KVM capability, but instead the "SHARED_TRACKING" (or 
however that flag is called) on the underlying guest_memfd instead?

I assume the function would then look something like

if (!kvm_supports_gmem(kvm))
	return false;
if (kvm_arch_gmem_supports_shared_mem(kvm))
	return .. TBD, test the gmem flag for the slot via gfn
return kvm_mem_is_private(kvm, gfn);

> +
>   	return kvm_mem_is_private(kvm, gfn);
>   }
>   
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b6ae8ad8934b..8bc8046c7f3a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -930,6 +930,7 @@ struct kvm_enable_cap {
>   #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
>   #define KVM_CAP_X86_GUEST_MODE 238
>   #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
> +#define KVM_CAP_GMEM_SHARED_MEM 240
>   
>   struct kvm_irq_routing_irqchip {
>   	__u32 irqchip;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 559c93ad90be..f4e469a62a60 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
>   config HAVE_KVM_ARCH_GMEM_INVALIDATE
>          bool
>          depends on KVM_GMEM
> +
> +config KVM_GMEM_SHARED_MEM
> +       select KVM_GMEM
> +       bool
> +       prompt "Enables in-place shared memory for guest_memfd"
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 6db515833f61..8bc8fc991d58 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -312,7 +312,99 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>   	return gfn - slot->base_gfn + slot->gmem.pgoff;
>   }
>   
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +/*
> + * Returns true if the folio is shared with the host and the guest.
> + */
> +static bool kvm_gmem_offset_is_shared(struct file *file, pgoff_t index)
> +{
> +	struct kvm_gmem *gmem = file->private_data;
> +
> +	/* For now, VMs that support shared memory share all their memory. */
> +	return kvm_arch_gmem_supports_shared_mem(gmem->kvm);

Similar here: likely we want to check the guest_memfd flag.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-02 15:04     ` David Hildenbrand
@ 2025-05-02 16:21       ` Sean Christopherson
  2025-05-02 22:00         ` Ackerley Tng
  0 siblings, 1 reply; 63+ messages in thread
From: Sean Christopherson @ 2025-05-02 16:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta

On Fri, May 02, 2025, David Hildenbrand wrote:
> On 30.04.25 20:58, Ackerley Tng wrote:
> > > -	if (is_private)
> > > +	if (is_gmem)
> > >   		return max_level;
> > 
> > I think this renaming isn't quite accurate.
> 
> After our discussion yesterday, does that still hold true?

No.

> > IIUC in __kvm_mmu_max_mapping_level(), we skip considering
> > host_pfn_mapping_level() if the gfn is private because private memory
> > will not be mapped to userspace, so there's no need to query userspace
> > page tables in host_pfn_mapping_level().
> 
> I think the reason was that: for private we won't be walking the user space
> pages tables.
> 
> Once guest_memfd is also responsible for the shared part, why should this
> here still be private-only, and why should we consider querying a user space
> mapping that might not even exist?

+1, one of the big selling points for guest_memfd beyond CoCo is that it provides
guest-first memory.  It is very explicitly an intended feature that the guest
mappings KVM creates can be a superset of the host userspace mappings.  E.g. the
guest can use larger page sizes, have RW while the host has RO, etc.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-02 16:21       ` Sean Christopherson
@ 2025-05-02 22:00         ` Ackerley Tng
  2025-05-05  8:01           ` David Hildenbrand
  0 siblings, 1 reply; 63+ messages in thread
From: Ackerley Tng @ 2025-05-02 22:00 UTC (permalink / raw)
  To: Sean Christopherson, David Hildenbrand
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta

Sean Christopherson <seanjc@google.com> writes:

> On Fri, May 02, 2025, David Hildenbrand wrote:
>> On 30.04.25 20:58, Ackerley Tng wrote:
>> > > -	if (is_private)
>> > > +	if (is_gmem)
>> > >   		return max_level;
>> > 
>> > I think this renaming isn't quite accurate.
>> 
>> After our discussion yesterday, does that still hold true?
>
> No.
>
>> > IIUC in __kvm_mmu_max_mapping_level(), we skip considering
>> > host_pfn_mapping_level() if the gfn is private because private memory
>> > will not be mapped to userspace, so there's no need to query userspace
>> > page tables in host_pfn_mapping_level().
>> 
>> I think the reason was that: for private we won't be walking the user space
>> pages tables.
>> 
>> Once guest_memfd is also responsible for the shared part, why should this
>> here still be private-only, and why should we consider querying a user space
>> mapping that might not even exist?
>
> +1, one of the big selling points for guest_memfd beyond CoCo is that it provides
> guest-first memory.  It is very explicitly an intended feature that the guest
> mappings KVM creates can be a superset of the host userspace mappings.  E.g. the
> guest can use larger page sizes, have RW while the host has RO, etc.

Do you mean that __kvm_mmu_max_mapping_level() should, in addition to
the parameter renaming from is_private to is_gmem, do something like

if (is_gmem)
	return kvm_gmem_get_max_mapping_level(slot, gfn);

and basically defer to gmem as long as gmem should be used for this gfn?

There is another call to __kvm_mmu_max_mapping_level() via
kvm_mmu_max_mapping_level() beginning from recover_huge_pages_range(),
and IIUC that doesn't go through guest_memfd.

Hence, unlike the call to __kvm_mmu_max_mapping_level() from the KVM x86
MMU fault path, guest_memfd didn't get a chance to provide its input in
the form of returning max_order from kvm_gmem_get_pfn().

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-02 15:11   ` David Hildenbrand
@ 2025-05-02 22:06     ` Ackerley Tng
  0 siblings, 0 replies; 63+ messages in thread
From: Ackerley Tng @ 2025-05-02 22:06 UTC (permalink / raw)
  To: David Hildenbrand, Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, mail, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta

David Hildenbrand <david@redhat.com> writes:

> On 30.04.25 18:56, Fuad Tabba wrote:
>> Add support for mmap() and fault() for guest_memfd backed memory
>> in the host for VMs that support in-place conversion between
>> shared and private. To that end, this patch adds the ability to
>> check whether the VM type supports in-place conversion, and only
>> allows mapping its memory if that's the case.
>> 
>> This patch introduces the configuration option KVM_GMEM_SHARED_MEM,
>> which enables support for in-place shared memory.
>> 
>> It also introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
>> indicates that the host can create VMs that support shared memory.
>> Supporting shared memory implies that memory can be mapped when shared
>> with the host.
>> 
>> Signed-off-by: Fuad Tabba <tabba@google.com>
>> ---
>>   include/linux/kvm_host.h | 15 ++++++-
>>   include/uapi/linux/kvm.h |  1 +
>>   virt/kvm/Kconfig         |  5 +++
>>   virt/kvm/guest_memfd.c   | 92 ++++++++++++++++++++++++++++++++++++++++
>>   virt/kvm/kvm_main.c      |  4 ++
>>   5 files changed, 116 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 9419fb99f7c2..f3af6bff3232 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -729,6 +729,17 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>>   }
>>   #endif
>>   
>> +/*
>> + * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
>> + * private memory is enabled and it supports in-place shared/private conversion.
>> + */
>> +#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
>> +static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
>> +{
>> +	return false;
>> +}
>> +#endif
>> +
>>   #ifndef kvm_arch_has_readonly_mem
>>   static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
>>   {
>> @@ -2516,7 +2527,9 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>   
>>   static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
>>   {
>> -	/* For now, only private memory gets consumed from guest_memfd. */
>> +	if (kvm_arch_gmem_supports_shared_mem(kvm))
>> +		return true;
>
> After our discussion yesterday, am I correct that we will not be 
> querying the KVM capability, but instead the "SHARED_TRACKING" (or 
> however that flag is called) on the underlying guest_memfd instead?
>
> I assume the function would then look something like
>
> if (!kvm_supports_gmem(kvm))
> 	return false;
> if (kvm_arch_gmem_supports_shared_mem(kvm))
> 	return .. TBD, test the gmem flag for the slot via gfn
> return kvm_mem_is_private(kvm, gfn);
>

Yes, I believe we're aligned here. I added a patch that will do this,
but it depends on other parts of the patch series and I think it's
better if you review it altogether when Fuad posts it (as opposed to
reviewing a snippet I drop here.

>> +
>>   	return kvm_mem_is_private(kvm, gfn);
>>   }
>>   
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index b6ae8ad8934b..8bc8046c7f3a 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -930,6 +930,7 @@ struct kvm_enable_cap {
>>   #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
>>   #define KVM_CAP_X86_GUEST_MODE 238
>>   #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
>> +#define KVM_CAP_GMEM_SHARED_MEM 240
>>   
>>   struct kvm_irq_routing_irqchip {
>>   	__u32 irqchip;
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 559c93ad90be..f4e469a62a60 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -128,3 +128,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
>>   config HAVE_KVM_ARCH_GMEM_INVALIDATE
>>          bool
>>          depends on KVM_GMEM
>> +
>> +config KVM_GMEM_SHARED_MEM
>> +       select KVM_GMEM
>> +       bool
>> +       prompt "Enables in-place shared memory for guest_memfd"
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 6db515833f61..8bc8fc991d58 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -312,7 +312,99 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>>   	return gfn - slot->base_gfn + slot->gmem.pgoff;
>>   }
>>   
>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
>> +/*
>> + * Returns true if the folio is shared with the host and the guest.
>> + */
>> +static bool kvm_gmem_offset_is_shared(struct file *file, pgoff_t index)
>> +{
>> +	struct kvm_gmem *gmem = file->private_data;
>> +
>> +	/* For now, VMs that support shared memory share all their memory. */
>> +	return kvm_arch_gmem_supports_shared_mem(gmem->kvm);
>
> Similar here: likely we want to check the guest_memfd flag.
>
> -- 
> Cheers,
>
> David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
  2025-04-30 21:33   ` David Hildenbrand
  2025-05-02 15:11   ` David Hildenbrand
@ 2025-05-02 22:29   ` Ackerley Tng
  2025-05-06  8:47     ` Yan Zhao
  2025-05-05 21:06   ` Ira Weiny
  2025-05-09 20:54   ` James Houghton
  4 siblings, 1 reply; 63+ messages in thread
From: Ackerley Tng @ 2025-05-02 22:29 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, mail, david, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta, tabba

Fuad Tabba <tabba@google.com> writes:

> Add support for mmap() and fault() for guest_memfd backed memory
> in the host for VMs that support in-place conversion between
> shared and private. To that end, this patch adds the ability to
> check whether the VM type supports in-place conversion, and only
> allows mapping its memory if that's the case.
>
> This patch introduces the configuration option KVM_GMEM_SHARED_MEM,
> which enables support for in-place shared memory.
>
> It also introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> indicates that the host can create VMs that support shared memory.
> Supporting shared memory implies that memory can be mapped when shared
> with the host.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  include/linux/kvm_host.h | 15 ++++++-
>  include/uapi/linux/kvm.h |  1 +
>  virt/kvm/Kconfig         |  5 +++
>  virt/kvm/guest_memfd.c   | 92 ++++++++++++++++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      |  4 ++
>  5 files changed, 116 insertions(+), 1 deletion(-)
>
> <snip>

At the guest_memfd call on 2025-05-01, we discussed that if guest_memfd
is created with GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then if
slot->userspace_addr != 0, we would validate that the folio
slot->userspace_addr points to matches up with the folio guest_memfd
would return for the same offset.

I can think of one way to do this validation, which is to call KVM's
hva_to_pfn() function and then call kvm_gmem_get_folio() on the fd and
offset, and then check that the PFNs are equal.

However, that would cause the page to be allocated. Any ideas on how we
could do this validation without allocating the page?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-02 22:00         ` Ackerley Tng
@ 2025-05-05  8:01           ` David Hildenbrand
  2025-05-05 22:57             ` Sean Christopherson
  2025-05-05 23:09             ` Ackerley Tng
  0 siblings, 2 replies; 63+ messages in thread
From: David Hildenbrand @ 2025-05-05  8:01 UTC (permalink / raw)
  To: Ackerley Tng, Sean Christopherson
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta

On 03.05.25 00:00, Ackerley Tng wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
>> On Fri, May 02, 2025, David Hildenbrand wrote:
>>> On 30.04.25 20:58, Ackerley Tng wrote:
>>>>> -	if (is_private)
>>>>> +	if (is_gmem)
>>>>>    		return max_level;
>>>>
>>>> I think this renaming isn't quite accurate.
>>>
>>> After our discussion yesterday, does that still hold true?
>>
>> No.
>>
>>>> IIUC in __kvm_mmu_max_mapping_level(), we skip considering
>>>> host_pfn_mapping_level() if the gfn is private because private memory
>>>> will not be mapped to userspace, so there's no need to query userspace
>>>> page tables in host_pfn_mapping_level().
>>>
>>> I think the reason was that: for private we won't be walking the user space
>>> pages tables.
>>>
>>> Once guest_memfd is also responsible for the shared part, why should this
>>> here still be private-only, and why should we consider querying a user space
>>> mapping that might not even exist?
>>
>> +1, one of the big selling points for guest_memfd beyond CoCo is that it provides
>> guest-first memory.  It is very explicitly an intended feature that the guest
>> mappings KVM creates can be a superset of the host userspace mappings.  E.g. the
>> guest can use larger page sizes, have RW while the host has RO, etc.
> 
> Do you mean that __kvm_mmu_max_mapping_level() should, in addition to
> the parameter renaming from is_private to is_gmem, do something like
> 
> if (is_gmem)
> 	return kvm_gmem_get_max_mapping_level(slot, gfn);

I assume you mean, not looking at lpage_info at all?

I have limited understanding what lpage_info is or what it does. I 
believe all it adds is a mechanism to *disable* large page mappings.

We want to disable large pages if (using 2M region as example)

(a) Mixed memory attributes. If a PFN falls into a 2M region, and parts
     of that region are shared vs. private (mixed memory attributes ->
     KVM_LPAGE_MIXED_FLAG)

  -> With gmem-shared we could have mixed memory attributes, not a PFN
     fracturing. (PFNs don't depend on memory attributes)

(b) page track: intercepting (mostly write) access to GFNs


So, I wonder if we still have to take care of lpage_info, at least for
handling (b) correctly [I assume so]. Regarding (a) I am not sure: once 
memory attributes are handled by gmem in the gmem-shared case. IIRC, 
with AMD SEV we might still have to honor it? But gmem itself could 
handle that.


What we could definitely do here for now is:

if (is_gmem)
	/* gmem only supports 4k pages for now. */
	return PG_LEVEL_4K;

And not worry about lpage_infor for the time being, until we actually do 
support larger pages.


> 
> and basically defer to gmem as long as gmem should be used for this gfn?
> 
> There is another call to __kvm_mmu_max_mapping_level() via
> kvm_mmu_max_mapping_level() beginning from recover_huge_pages_range(),
> and IIUC that doesn't go through guest_memfd.
> 
> Hence, unlike the call to __kvm_mmu_max_mapping_level() from the KVM x86
> MMU fault path, guest_memfd didn't get a chance to provide its input in
> the form of returning max_order from kvm_gmem_get_pfn().

Right, we essentially say that "this is a private fault", likely 
assuming that we already verified earlier that the memory is also private.

[I can see that happening when the function is called through 
direct_page_fault()]

We could simply call kvm_mmu_max_mapping_level() from 
kvm_mmu_hugepage_adjust() I guess. (could possibly be optimized later)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
                     ` (2 preceding siblings ...)
  2025-05-02 22:29   ` Ackerley Tng
@ 2025-05-05 21:06   ` Ira Weiny
  2025-05-06 12:15     ` Fuad Tabba
  2025-05-09 20:54   ` James Houghton
  4 siblings, 1 reply; 63+ messages in thread
From: Ira Weiny @ 2025-05-05 21:06 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	tabba

Fuad Tabba wrote:
> Add support for mmap() and fault() for guest_memfd backed memory
> in the host for VMs that support in-place conversion between
> shared and private. To that end, this patch adds the ability to
> check whether the VM type supports in-place conversion, and only
> allows mapping its memory if that's the case.
> 
> This patch introduces the configuration option KVM_GMEM_SHARED_MEM,
> which enables support for in-place shared memory.
> 
> It also introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> indicates that the host can create VMs that support shared memory.
> Supporting shared memory implies that memory can be mapped when shared
> with the host.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  include/linux/kvm_host.h | 15 ++++++-
>  include/uapi/linux/kvm.h |  1 +
>  virt/kvm/Kconfig         |  5 +++
>  virt/kvm/guest_memfd.c   | 92 ++++++++++++++++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      |  4 ++
>  5 files changed, 116 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9419fb99f7c2..f3af6bff3232 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -729,6 +729,17 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>  }
>  #endif
>  
> +/*
> + * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
> + * private memory is enabled and it supports in-place shared/private conversion.
> + */
> +#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)

Perhaps the bots already caught this?

I just tried enabling KVM_GMEM_SHARED_MEM on x86 with this patch and it fails with:

|| In file included from arch/x86/kvm/../../../virt/kvm/binary_stats.c:8:
|| ./include/linux/kvm_host.h: In function ‘kvm_mem_from_gmem’:
include/linux/kvm_host.h|2530 col 13| error: implicit declaration of function ‘kvm_arch_gmem_supports_shared_mem’ [-Wimplicit-function-declaration]
||  2530 |         if (kvm_arch_gmem_supports_shared_mem(kvm))
||       |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|| make[4]: *** Waiting for unfinished jobs....


I think the predicate on !CONFIG_KVM_GMEM_SHARED_MEM is wrong.

Shouldn't this always default off?  I __think__ this then gets enabled in
11/13?

IOW

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3af6bff3232..577674e95c09 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -733,7 +733,7 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
  * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
  * private memory is enabled and it supports in-place shared/private conversion.
  */
-#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
+#if !defined(kvm_arch_gmem_supports_shared_mem)
 static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
 {
        return false;

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-05  8:01           ` David Hildenbrand
@ 2025-05-05 22:57             ` Sean Christopherson
  2025-05-06  5:17               ` Vishal Annapurve
  2025-05-06 19:27               ` Ackerley Tng
  2025-05-05 23:09             ` Ackerley Tng
  1 sibling, 2 replies; 63+ messages in thread
From: Sean Christopherson @ 2025-05-05 22:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta

On Mon, May 05, 2025, David Hildenbrand wrote:
> On 03.05.25 00:00, Ackerley Tng wrote:
> > Sean Christopherson <seanjc@google.com> writes:
> > 
> > > On Fri, May 02, 2025, David Hildenbrand wrote:
> > > > On 30.04.25 20:58, Ackerley Tng wrote:
> > > > > > -	if (is_private)
> > > > > > +	if (is_gmem)
> > > > > >    		return max_level;
> > > > > 
> > > > > I think this renaming isn't quite accurate.
> > > > 
> > > > After our discussion yesterday, does that still hold true?
> > > 
> > > No.
> > > 
> > > > > IIUC in __kvm_mmu_max_mapping_level(), we skip considering
> > > > > host_pfn_mapping_level() if the gfn is private because private memory
> > > > > will not be mapped to userspace, so there's no need to query userspace
> > > > > page tables in host_pfn_mapping_level().
> > > > 
> > > > I think the reason was that: for private we won't be walking the user space
> > > > pages tables.
> > > > 
> > > > Once guest_memfd is also responsible for the shared part, why should this
> > > > here still be private-only, and why should we consider querying a user space
> > > > mapping that might not even exist?
> > > 
> > > +1, one of the big selling points for guest_memfd beyond CoCo is that it provides
> > > guest-first memory.  It is very explicitly an intended feature that the guest
> > > mappings KVM creates can be a superset of the host userspace mappings.  E.g. the
> > > guest can use larger page sizes, have RW while the host has RO, etc.
> > 
> > Do you mean that __kvm_mmu_max_mapping_level() should, in addition to
> > the parameter renaming from is_private to is_gmem, do something like
> > 
> > if (is_gmem)
> > 	return kvm_gmem_get_max_mapping_level(slot, gfn);

No, kvm_gmem_get_pfn() already provides the maximum allowed order, we "just" need
to update that to constrain the max order based on shared vs. private.  E.g. from
the original guest_memfd hugepage support[*] (which never landed), to take care
of the pgoff not being properly aligned to the memslot.

+	/*
+	 * The folio can be mapped with a hugepage if and only if the folio is
+	 * fully contained by the range the memslot is bound to.  Note, the
+	 * caller is responsible for handling gfn alignment, this only deals
+	 * with the file binding.
+	 */
+	huge_index = ALIGN(index, 1ull << *max_order);
+	if (huge_index < ALIGN(slot->gmem.pgoff, 1ull << *max_order) ||
+	    huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages)
 		*max_order = 0;

[*] https://lore.kernel.org/all/20231027182217.3615211-18-seanjc@google.com

> I assume you mean, not looking at lpage_info at all?
> 
> I have limited understanding what lpage_info is or what it does. I believe
> all it adds is a mechanism to *disable* large page mappings.

Correct.  It's a bit of a catch-all that's used by a variety of KVM x86 features
to disable hugepages.

> We want to disable large pages if (using 2M region as example)
> 
> (a) Mixed memory attributes. If a PFN falls into a 2M region, and parts
>     of that region are shared vs. private (mixed memory attributes ->
>     KVM_LPAGE_MIXED_FLAG)
> 
>  -> With gmem-shared we could have mixed memory attributes, not a PFN
>     fracturing. (PFNs don't depend on memory attributes)
> 
> (b) page track: intercepting (mostly write) access to GFNs

It's also used to handle misaligned memslots (or sizes), e.g. if a 1GiB memory
region spanse 1GiB+4KiB => 2GiB+4KiB, KVM will disallow 1GiB hugepages, and 2MiB
hugepages for the head and tails.  Or if the host virtual address isn't aligned
with the guest physical address (see above for guest_memfd's role when there is
no hva).

> So, I wonder if we still have to take care of lpage_info, at least for
> handling (b) correctly [I assume so].

Ya, we do.

> Regarding (a) I am not sure: once memory attributes are handled by gmem in
> the gmem-shared case. IIRC, with AMD SEV we might still have to honor it? But
> gmem itself could handle that.
> 
> What we could definitely do here for now is:
> 
> if (is_gmem)
> 	/* gmem only supports 4k pages for now. */
> 	return PG_LEVEL_4K;
> 
> And not worry about lpage_infor for the time being, until we actually do
> support larger pages.

I don't want to completely punt on this, because if it gets messy, then I want
to know now and have a solution in hand, not find out N months from now.

That said, I don't expect it to be difficult.  What we could punt on is
performance of the lookups, which is the real reason KVM maintains the rather
expensive disallow_lpage array.

And that said, memslots can only bind to one guest_memfd instance, so I don't
immediately see any reason why the guest_memfd ioctl() couldn't process the
slots that are bound to it.  I.e. why not update KVM_LPAGE_MIXED_FLAG from the
guest_memfd ioctl() instead of from KVM_SET_MEMORY_ATTRIBUTES?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-05  8:01           ` David Hildenbrand
  2025-05-05 22:57             ` Sean Christopherson
@ 2025-05-05 23:09             ` Ackerley Tng
  2025-05-05 23:17               ` Sean Christopherson
  1 sibling, 1 reply; 63+ messages in thread
From: Ackerley Tng @ 2025-05-05 23:09 UTC (permalink / raw)
  To: David Hildenbrand, Sean Christopherson
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta

David Hildenbrand <david@redhat.com> writes:

> On 03.05.25 00:00, Ackerley Tng wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>>> On Fri, May 02, 2025, David Hildenbrand wrote:
>>>> On 30.04.25 20:58, Ackerley Tng wrote:
>>>>>> -	if (is_private)
>>>>>> +	if (is_gmem)
>>>>>>    		return max_level;
>>>>>
>>>>> I think this renaming isn't quite accurate.
>>>>
>>>> After our discussion yesterday, does that still hold true?
>>>
>>> No.
>>>
>>>>> IIUC in __kvm_mmu_max_mapping_level(), we skip considering
>>>>> host_pfn_mapping_level() if the gfn is private because private memory
>>>>> will not be mapped to userspace, so there's no need to query userspace
>>>>> page tables in host_pfn_mapping_level().
>>>>
>>>> I think the reason was that: for private we won't be walking the user space
>>>> pages tables.
>>>>
>>>> Once guest_memfd is also responsible for the shared part, why should this
>>>> here still be private-only, and why should we consider querying a user space
>>>> mapping that might not even exist?
>>>
>>> +1, one of the big selling points for guest_memfd beyond CoCo is that it provides
>>> guest-first memory.  It is very explicitly an intended feature that the guest
>>> mappings KVM creates can be a superset of the host userspace mappings.  E.g. the
>>> guest can use larger page sizes, have RW while the host has RO, etc.
>> 
>> Do you mean that __kvm_mmu_max_mapping_level() should, in addition to
>> the parameter renaming from is_private to is_gmem, do something like
>> 
>> if (is_gmem)
>> 	return kvm_gmem_get_max_mapping_level(slot, gfn);
>
> I assume you mean, not looking at lpage_info at all?
>

My bad. I actually meant just to take input from guest_memfd and stop
there without checking with host page tables, perhaps something like
min(kvm_gmem_get_max_mapping_level(slot, gfn), max_level);

> I have limited understanding what lpage_info is or what it does. I 
> believe all it adds is a mechanism to *disable* large page mappings.
>

This is my understanding too.

> We want to disable large pages if (using 2M region as example)
>
> (a) Mixed memory attributes. If a PFN falls into a 2M region, and parts
>      of that region are shared vs. private (mixed memory attributes ->
>      KVM_LPAGE_MIXED_FLAG)
>
>   -> With gmem-shared we could have mixed memory attributes, not a PFN
>      fracturing. (PFNs don't depend on memory attributes)
>
> (b) page track: intercepting (mostly write) access to GFNs
>

Could you explain more about page track case? 

>
> So, I wonder if we still have to take care of lpage_info, at least for
> handling (b) correctly [I assume so]. Regarding (a) I am not sure: once 
> memory attributes are handled by gmem in the gmem-shared case. IIRC, 
> with AMD SEV we might still have to honor it? But gmem itself could 
> handle that.
>

For AMD SEV, I believe kvm_max_private_mapping_level() already takes
care of that, at least for the MMU faulting path [1], where guest_memfd
gives input using max_order, then arch-specific callback contributes input.

>
> What we could definitely do here for now is:
>
> if (is_gmem)
> 	/* gmem only supports 4k pages for now. */
> 	return PG_LEVEL_4K;
>
> And not worry about lpage_infor for the time being, until we actually do 
> support larger pages.
>
>

Perhaps this is better explained as an RFC in code. I'll put in a patch
as part of Fuad's series if Fuad doesn't mind.

>> 
>> and basically defer to gmem as long as gmem should be used for this gfn?
>> 
>> There is another call to __kvm_mmu_max_mapping_level() via
>> kvm_mmu_max_mapping_level() beginning from recover_huge_pages_range(),
>> and IIUC that doesn't go through guest_memfd.
>> 
>> Hence, unlike the call to __kvm_mmu_max_mapping_level() from the KVM x86
>> MMU fault path, guest_memfd didn't get a chance to provide its input in
>> the form of returning max_order from kvm_gmem_get_pfn().
>
> Right, we essentially say that "this is a private fault", likely 
> assuming that we already verified earlier that the memory is also private.
>
> [I can see that happening when the function is called through 
> direct_page_fault()]
>
> We could simply call kvm_mmu_max_mapping_level() from 
> kvm_mmu_hugepage_adjust() I guess. (could possibly be optimized later)
>
> -- 
> Cheers,
>
> David / dhildenb

[1] https://github.com/torvalds/linux/blob/01f95500a162fca88cefab9ed64ceded5afabc12/arch/x86/kvm/mmu/mmu.c#L4480

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-05 23:09             ` Ackerley Tng
@ 2025-05-05 23:17               ` Sean Christopherson
  0 siblings, 0 replies; 63+ messages in thread
From: Sean Christopherson @ 2025-05-05 23:17 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: David Hildenbrand, Fuad Tabba, kvm, linux-arm-msm, linux-mm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
	brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta

On Mon, May 05, 2025, Ackerley Tng wrote:
> > On 03.05.25 00:00, Ackerley Tng wrote:
> > We want to disable large pages if (using 2M region as example)
> >
> > (a) Mixed memory attributes. If a PFN falls into a 2M region, and parts
> >      of that region are shared vs. private (mixed memory attributes ->
> >      KVM_LPAGE_MIXED_FLAG)
> >
> >   -> With gmem-shared we could have mixed memory attributes, not a PFN
> >      fracturing. (PFNs don't depend on memory attributes)
> >
> > (b) page track: intercepting (mostly write) access to GFNs
> >
> 
> Could you explain more about page track case? 

KVM disallows hugepages when shadowing a gfn, because write-protecting a 2MiB
(let alone a 1GiB) page would be insanely expensive, as KVM would need to intercept
and emulate an absurd number of instructions that have nothing to do with the
guest's page tables.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-05 22:57             ` Sean Christopherson
@ 2025-05-06  5:17               ` Vishal Annapurve
  2025-05-06  5:28                 ` Vishal Annapurve
  2025-05-06 19:27               ` Ackerley Tng
  1 sibling, 1 reply; 63+ messages in thread
From: Vishal Annapurve @ 2025-05-06  5:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Hildenbrand, Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm,
	linux-mm, pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer,
	aou, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, mail, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta

On Mon, May 5, 2025 at 3:57 PM Sean Christopherson <seanjc@google.com> wrote:
> > ...
> > And not worry about lpage_infor for the time being, until we actually do
> > support larger pages.
>
> I don't want to completely punt on this, because if it gets messy, then I want
> to know now and have a solution in hand, not find out N months from now.
>
> That said, I don't expect it to be difficult.  What we could punt on is
> performance of the lookups, which is the real reason KVM maintains the rather
> expensive disallow_lpage array.
>
> And that said, memslots can only bind to one guest_memfd instance, so I don't
> immediately see any reason why the guest_memfd ioctl() couldn't process the
> slots that are bound to it.  I.e. why not update KVM_LPAGE_MIXED_FLAG from the
> guest_memfd ioctl() instead of from KVM_SET_MEMORY_ATTRIBUTES?

I am missing the point here to update KVM_LPAGE_MIXED_FLAG for the
scenarios where in-place memory conversion will be supported with
guest_memfd. As guest_memfd support for hugepages comes with the
design that hugepages can't have mixed attributes. i.e. max_order
returned by get_pfn will always have the same attributes for the folio
range.

Is your suggestion around using guest_memfd ioctl() to also toggle
memory attributes for the scenarios where guest_memfd instance doesn't
have in-place memory conversion feature enabled?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-06  5:17               ` Vishal Annapurve
@ 2025-05-06  5:28                 ` Vishal Annapurve
  2025-05-06 13:58                   ` Sean Christopherson
  0 siblings, 1 reply; 63+ messages in thread
From: Vishal Annapurve @ 2025-05-06  5:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Hildenbrand, Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm,
	linux-mm, pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer,
	aou, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, mail, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta

On Mon, May 5, 2025 at 10:17 PM Vishal Annapurve <vannapurve@google.com> wrote:
>
> On Mon, May 5, 2025 at 3:57 PM Sean Christopherson <seanjc@google.com> wrote:
> > > ...
> > > And not worry about lpage_infor for the time being, until we actually do
> > > support larger pages.
> >
> > I don't want to completely punt on this, because if it gets messy, then I want
> > to know now and have a solution in hand, not find out N months from now.
> >
> > That said, I don't expect it to be difficult.  What we could punt on is
> > performance of the lookups, which is the real reason KVM maintains the rather
> > expensive disallow_lpage array.
> >
> > And that said, memslots can only bind to one guest_memfd instance, so I don't
> > immediately see any reason why the guest_memfd ioctl() couldn't process the
> > slots that are bound to it.  I.e. why not update KVM_LPAGE_MIXED_FLAG from the
> > guest_memfd ioctl() instead of from KVM_SET_MEMORY_ATTRIBUTES?
>
> I am missing the point here to update KVM_LPAGE_MIXED_FLAG for the
> scenarios where in-place memory conversion will be supported with
> guest_memfd. As guest_memfd support for hugepages comes with the
> design that hugepages can't have mixed attributes. i.e. max_order
> returned by get_pfn will always have the same attributes for the folio
> range.
>
> Is your suggestion around using guest_memfd ioctl() to also toggle
> memory attributes for the scenarios where guest_memfd instance doesn't
> have in-place memory conversion feature enabled?

Reading more into your response, I guess your suggestion is about
covering different usecases present today and new usecases which may
land in future, that rely on kvm_lpage_info for faster lookup. If so,
then it should be easy to modify guest_memfd ioctl to update
kvm_lpage_info as you suggested.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-02 22:29   ` Ackerley Tng
@ 2025-05-06  8:47     ` Yan Zhao
  0 siblings, 0 replies; 63+ messages in thread
From: Yan Zhao @ 2025-05-06  8:47 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, seanjc, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta

On Fri, May 02, 2025 at 03:29:53PM -0700, Ackerley Tng wrote:
> Fuad Tabba <tabba@google.com> writes:
> 
> > Add support for mmap() and fault() for guest_memfd backed memory
> > in the host for VMs that support in-place conversion between
> > shared and private. To that end, this patch adds the ability to
> > check whether the VM type supports in-place conversion, and only
> > allows mapping its memory if that's the case.
> >
> > This patch introduces the configuration option KVM_GMEM_SHARED_MEM,
> > which enables support for in-place shared memory.
> >
> > It also introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> > indicates that the host can create VMs that support shared memory.
> > Supporting shared memory implies that memory can be mapped when shared
> > with the host.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  include/linux/kvm_host.h | 15 ++++++-
> >  include/uapi/linux/kvm.h |  1 +
> >  virt/kvm/Kconfig         |  5 +++
> >  virt/kvm/guest_memfd.c   | 92 ++++++++++++++++++++++++++++++++++++++++
> >  virt/kvm/kvm_main.c      |  4 ++
> >  5 files changed, 116 insertions(+), 1 deletion(-)
> >
> > <snip>
> 
> At the guest_memfd call on 2025-05-01, we discussed that if guest_memfd
> is created with GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then if
> slot->userspace_addr != 0, we would validate that the folio
> slot->userspace_addr points to matches up with the folio guest_memfd
> would return for the same offset.
Where will the validation be executed? In kvm_gmem_bind()?

> 
> I can think of one way to do this validation, which is to call KVM's
> hva_to_pfn() function and then call kvm_gmem_get_folio() on the fd and
> offset, and then check that the PFNs are equal.
> 
> However, that would cause the page to be allocated. Any ideas on how we
> could do this validation without allocating the page?
If the check is in kvm_gmem_bind() and if there's no worry about munmap() and
re-mmap() of the shared memory pointed by slot->userspace_addr, maybe below?

mm = kvm->mm; 
mmap_read_lock(mm);
vma = vma_lookup(mm, vaddr);
pgoff = ((slot->userspace_addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
mmap_read_unlock(mm);

Then check if pgoff equals to slot->gmem.guest_memfd_offset.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-05 21:06   ` Ira Weiny
@ 2025-05-06 12:15     ` Fuad Tabba
  0 siblings, 0 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-05-06 12:15 UTC (permalink / raw)
  To: Ira Weiny
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
	pankaj.gupta

Hi Ira,

On Mon, 5 May 2025 at 22:05, Ira Weiny <ira.weiny@intel.com> wrote:
>
> Fuad Tabba wrote:
> > Add support for mmap() and fault() for guest_memfd backed memory
> > in the host for VMs that support in-place conversion between
> > shared and private. To that end, this patch adds the ability to
> > check whether the VM type supports in-place conversion, and only
> > allows mapping its memory if that's the case.
> >
> > This patch introduces the configuration option KVM_GMEM_SHARED_MEM,
> > which enables support for in-place shared memory.
> >
> > It also introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> > indicates that the host can create VMs that support shared memory.
> > Supporting shared memory implies that memory can be mapped when shared
> > with the host.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  include/linux/kvm_host.h | 15 ++++++-
> >  include/uapi/linux/kvm.h |  1 +
> >  virt/kvm/Kconfig         |  5 +++
> >  virt/kvm/guest_memfd.c   | 92 ++++++++++++++++++++++++++++++++++++++++
> >  virt/kvm/kvm_main.c      |  4 ++
> >  5 files changed, 116 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 9419fb99f7c2..f3af6bff3232 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -729,6 +729,17 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> >  }
> >  #endif
> >
> > +/*
> > + * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
> > + * private memory is enabled and it supports in-place shared/private conversion.
> > + */
> > +#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
>
> Perhaps the bots already caught this?
>
> I just tried enabling KVM_GMEM_SHARED_MEM on x86 with this patch and it fails with:
>
> || In file included from arch/x86/kvm/../../../virt/kvm/binary_stats.c:8:
> || ./include/linux/kvm_host.h: In function ‘kvm_mem_from_gmem’:
> include/linux/kvm_host.h|2530 col 13| error: implicit declaration of function ‘kvm_arch_gmem_supports_shared_mem’ [-Wimplicit-function-declaration]
> ||  2530 |         if (kvm_arch_gmem_supports_shared_mem(kvm))
> ||       |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> || make[4]: *** Waiting for unfinished jobs....
>
>
> I think the predicate on !CONFIG_KVM_GMEM_SHARED_MEM is wrong.
>
> Shouldn't this always default off?  I __think__ this then gets enabled in
> 11/13?

You're right. With the other comments from David and Ackerley, this
functions is gone, replaced by checking a per-vm flag.

Thanks,
/fuad

> IOW
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f3af6bff3232..577674e95c09 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -733,7 +733,7 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
>   * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
>   * private memory is enabled and it supports in-place shared/private conversion.
>   */
> -#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
> +#if !defined(kvm_arch_gmem_supports_shared_mem)
>  static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
>  {
>         return false;

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-06  5:28                 ` Vishal Annapurve
@ 2025-05-06 13:58                   ` Sean Christopherson
  2025-05-06 14:15                     ` David Hildenbrand
  0 siblings, 1 reply; 63+ messages in thread
From: Sean Christopherson @ 2025-05-06 13:58 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: David Hildenbrand, Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm,
	linux-mm, pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer,
	aou, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, mail, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx, pankaj.gupta

On Mon, May 05, 2025, Vishal Annapurve wrote:
> On Mon, May 5, 2025 at 10:17 PM Vishal Annapurve <vannapurve@google.com> wrote:
> >
> > On Mon, May 5, 2025 at 3:57 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > ...
> > > > And not worry about lpage_infor for the time being, until we actually do
> > > > support larger pages.
> > >
> > > I don't want to completely punt on this, because if it gets messy, then I want
> > > to know now and have a solution in hand, not find out N months from now.
> > >
> > > That said, I don't expect it to be difficult.  What we could punt on is
> > > performance of the lookups, which is the real reason KVM maintains the rather
> > > expensive disallow_lpage array.
> > >
> > > And that said, memslots can only bind to one guest_memfd instance, so I don't
> > > immediately see any reason why the guest_memfd ioctl() couldn't process the
> > > slots that are bound to it.  I.e. why not update KVM_LPAGE_MIXED_FLAG from the
> > > guest_memfd ioctl() instead of from KVM_SET_MEMORY_ATTRIBUTES?
> >
> > I am missing the point here to update KVM_LPAGE_MIXED_FLAG for the
> > scenarios where in-place memory conversion will be supported with
> > guest_memfd. As guest_memfd support for hugepages comes with the
> > design that hugepages can't have mixed attributes. i.e. max_order
> > returned by get_pfn will always have the same attributes for the folio
> > range.

Oh, if this will naturally be handled by guest_memfd, then do that.  I was purely
reacting to David's suggestion to "not worry about lpage_infor for the time being,
until we actually do support larger pages".

> > Is your suggestion around using guest_memfd ioctl() to also toggle
> > memory attributes for the scenarios where guest_memfd instance doesn't
> > have in-place memory conversion feature enabled?
> 
> Reading more into your response, I guess your suggestion is about
> covering different usecases present today and new usecases which may
> land in future, that rely on kvm_lpage_info for faster lookup. If so,
> then it should be easy to modify guest_memfd ioctl to update
> kvm_lpage_info as you suggested.

Nah, I just missed/forgot that using a single guest_memfd for private and shared
would naturally need to split the folio and thus this would Just Work.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-06 13:58                   ` Sean Christopherson
@ 2025-05-06 14:15                     ` David Hildenbrand
  2025-05-06 20:46                       ` Ackerley Tng
  0 siblings, 1 reply; 63+ messages in thread
From: David Hildenbrand @ 2025-05-06 14:15 UTC (permalink / raw)
  To: Sean Christopherson, Vishal Annapurve
  Cc: Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vbabka, mail, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta

On 06.05.25 15:58, Sean Christopherson wrote:
> On Mon, May 05, 2025, Vishal Annapurve wrote:
>> On Mon, May 5, 2025 at 10:17 PM Vishal Annapurve <vannapurve@google.com> wrote:
>>>
>>> On Mon, May 5, 2025 at 3:57 PM Sean Christopherson <seanjc@google.com> wrote:
>>>>> ...
>>>>> And not worry about lpage_infor for the time being, until we actually do
>>>>> support larger pages.
>>>>
>>>> I don't want to completely punt on this, because if it gets messy, then I want
>>>> to know now and have a solution in hand, not find out N months from now.
>>>>
>>>> That said, I don't expect it to be difficult.  What we could punt on is
>>>> performance of the lookups, which is the real reason KVM maintains the rather
>>>> expensive disallow_lpage array.
>>>>
>>>> And that said, memslots can only bind to one guest_memfd instance, so I don't
>>>> immediately see any reason why the guest_memfd ioctl() couldn't process the
>>>> slots that are bound to it.  I.e. why not update KVM_LPAGE_MIXED_FLAG from the
>>>> guest_memfd ioctl() instead of from KVM_SET_MEMORY_ATTRIBUTES?
>>>
>>> I am missing the point here to update KVM_LPAGE_MIXED_FLAG for the
>>> scenarios where in-place memory conversion will be supported with
>>> guest_memfd. As guest_memfd support for hugepages comes with the
>>> design that hugepages can't have mixed attributes. i.e. max_order
>>> returned by get_pfn will always have the same attributes for the folio
>>> range.
> 
> Oh, if this will naturally be handled by guest_memfd, then do that.  I was purely
> reacting to David's suggestion to "not worry about lpage_infor for the time being,
> until we actually do support larger pages".
> 
>>> Is your suggestion around using guest_memfd ioctl() to also toggle
>>> memory attributes for the scenarios where guest_memfd instance doesn't
>>> have in-place memory conversion feature enabled?
>>
>> Reading more into your response, I guess your suggestion is about
>> covering different usecases present today and new usecases which may
>> land in future, that rely on kvm_lpage_info for faster lookup. If so,
>> then it should be easy to modify guest_memfd ioctl to update
>> kvm_lpage_info as you suggested.
> 
> Nah, I just missed/forgot that using a single guest_memfd for private and shared
> would naturally need to split the folio and thus this would Just Work.

Yeah, I ignored that fact as well. So essentially, this patch should be 
mostly good for now.

Only kvm_mmu_hugepage_adjust() must be taught to not rely on 
fault->is_private.

Once we support large folios in guest_memfd, only the "alignment" 
consideration might have to be taken into account.

Anything else?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-05 22:57             ` Sean Christopherson
  2025-05-06  5:17               ` Vishal Annapurve
@ 2025-05-06 19:27               ` Ackerley Tng
  1 sibling, 0 replies; 63+ messages in thread
From: Ackerley Tng @ 2025-05-06 19:27 UTC (permalink / raw)
  To: Sean Christopherson, David Hildenbrand
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
	Yan Zhao

Sean Christopherson <seanjc@google.com> writes:

> <snip>
>
> ... we "just" need
> to update that to constrain the max order based on shared vs. private.  E.g. from
> the original guest_memfd hugepage support[*] (which never landed), to take care
> of the pgoff not being properly aligned to the memslot.
>
> +	/*
> +	 * The folio can be mapped with a hugepage if and only if the folio is
> +	 * fully contained by the range the memslot is bound to.  Note, the
> +	 * caller is responsible for handling gfn alignment, this only deals
> +	 * with the file binding.
> +	 */
> +	huge_index = ALIGN(index, 1ull << *max_order);
> +	if (huge_index < ALIGN(slot->gmem.pgoff, 1ull << *max_order) ||
> +	    huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages)
>  		*max_order = 0;
>
> [*] https://lore.kernel.org/all/20231027182217.3615211-18-seanjc@google.com
>

Regarding this alignment check, did you also consider checking at
memslot binding time? Would this [1] work/be better?

[1] https://lore.kernel.org/all/diqz1pt1sfw8.fsf@ackerleytng-ctop.c.googlers.com/





^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-06 14:15                     ` David Hildenbrand
@ 2025-05-06 20:46                       ` Ackerley Tng
  2025-05-08 14:12                         ` Sean Christopherson
                                           ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Ackerley Tng @ 2025-05-06 20:46 UTC (permalink / raw)
  To: David Hildenbrand, Sean Christopherson, Vishal Annapurve
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta

David Hildenbrand <david@redhat.com> writes:

> On 06.05.25 15:58, Sean Christopherson wrote:
>> On Mon, May 05, 2025, Vishal Annapurve wrote:
>>> On Mon, May 5, 2025 at 10:17 PM Vishal Annapurve <vannapurve@google.com> wrote:
>>>>
>>>> On Mon, May 5, 2025 at 3:57 PM Sean Christopherson <seanjc@google.com> wrote:
>>>>>> ...
>>>>>> And not worry about lpage_infor for the time being, until we actually do
>>>>>> support larger pages.
>>>>>
>>>>> I don't want to completely punt on this, because if it gets messy, then I want
>>>>> to know now and have a solution in hand, not find out N months from now.
>>>>>
>>>>> That said, I don't expect it to be difficult.  What we could punt on is
>>>>> performance of the lookups, which is the real reason KVM maintains the rather
>>>>> expensive disallow_lpage array.
>>>>>
>>>>> And that said, memslots can only bind to one guest_memfd instance, so I don't
>>>>> immediately see any reason why the guest_memfd ioctl() couldn't process the
>>>>> slots that are bound to it.  I.e. why not update KVM_LPAGE_MIXED_FLAG from the
>>>>> guest_memfd ioctl() instead of from KVM_SET_MEMORY_ATTRIBUTES?
>>>>
>>>> I am missing the point here to update KVM_LPAGE_MIXED_FLAG for the
>>>> scenarios where in-place memory conversion will be supported with
>>>> guest_memfd. As guest_memfd support for hugepages comes with the
>>>> design that hugepages can't have mixed attributes. i.e. max_order
>>>> returned by get_pfn will always have the same attributes for the folio
>>>> range.
>> 
>> Oh, if this will naturally be handled by guest_memfd, then do that.  I was purely
>> reacting to David's suggestion to "not worry about lpage_infor for the time being,
>> until we actually do support larger pages".
>> 
>>>> Is your suggestion around using guest_memfd ioctl() to also toggle
>>>> memory attributes for the scenarios where guest_memfd instance doesn't
>>>> have in-place memory conversion feature enabled?
>>>
>>> Reading more into your response, I guess your suggestion is about
>>> covering different usecases present today and new usecases which may
>>> land in future, that rely on kvm_lpage_info for faster lookup. If so,
>>> then it should be easy to modify guest_memfd ioctl to update
>>> kvm_lpage_info as you suggested.
>> 
>> Nah, I just missed/forgot that using a single guest_memfd for private and shared
>> would naturally need to split the folio and thus this would Just Work.

Sean, David, I'm circling back to make sure I'm following the discussion
correctly before Fuad sends out the next revision of this series.

>
> Yeah, I ignored that fact as well. So essentially, this patch should be 
> mostly good for now.
>

From here [1], these changes will make it to v9

+ kvm_max_private_mapping_level renaming to kvm_max_gmem_mapping_level
+ kvm_mmu_faultin_pfn_private renaming to kvm_mmu_faultin_pfn_gmem

> Only kvm_mmu_hugepage_adjust() must be taught to not rely on 
> fault->is_private.
>

I think fault->is_private should contribute to determining the max
mapping level.

By the time kvm_mmu_hugepage_adjust() is called,

* For Coco VMs using guest_memfd only for private memory,
  * fault->is_private would have been checked to align with
    kvm->mem_attr_array, so 
* For Coco VMs using guest_memfd for both private/shared memory,
  * fault->is_private would have been checked to align with
    guest_memfd's shareability
* For non-Coco VMs using guest_memfd
  * fault->is_private would be false

Hence fault->is_private can be relied on when calling
kvm_mmu_hugepage_adjust().

If fault->is_private, there will be no host userspace mapping to check,
hence in __kvm_mmu_max_mapping_level(), we should skip querying host
page tables.

If !fault->is_private, for shared memory ranges, if the VM uses
guest_memfd only for shared memory, we should query host page tables.

If !fault->is_private, for shared memory ranges, if the VM uses
guest_memfd for both shared/private memory, we should not query host
page tables.

If !fault->is_private, for non-Coco VMs, we should not query host page
tables.

I propose to rename the parameter is_private to skip_host_page_tables,
so

- if (is_private)
+ if (skip_host_page_tables)
	return max_level;

and pass

skip_host_page_tables = fault->is_private ||
			kvm_gmem_memslot_supports_shared(fault->slot);

where kvm_gmem_memslot_supports_shared() checks the inode in the memslot
for GUEST_MEMFD_FLAG_SUPPORT_SHARED.

For recover_huge_pages_range(), the other user of
__kvm_mmu_max_mapping_level(), currently there's no prior call to
kvm_gmem_get_pfn() to get max_order or max_level, so I propose to call
__kvm_mmu_max_mapping_level() with

if (kvm_gmem_memslot_supports_shared(slot)) {
	max_level = kvm_gmem_max_mapping_level(slot, gfn);
	skip_host_page_tables = true;
} else {
	max_level = PG_LEVEL_NUM;
        skip_host_page_tables = kvm_slot_has_gmem(slot) &&
				kvm_mem_is_private(kvm, gfn);
}

Without 1G support, kvm_gmem_max_mapping_level(slot, gfn) would always
return 4K.

With 1G support, kvm_gmem_max_mapping_level(slot, gfn) would return the
level for the page's order, at the offset corresponding to the gfn.

> Once we support large folios in guest_memfd, only the "alignment" 
> consideration might have to be taken into account.
>

I'll be handling this alignment as part of the 1G page support series
(won't be part of Fuad's first stage series) [2]

> Anything else?
>
> -- 
> Cheers,
>
> David / dhildenb


[1] https://lore.kernel.org/all/20250430165655.605595-7-tabba@google.com/
[2] https://lore.kernel.org/all/diqz1pt1sfw8.fsf@ackerleytng-ctop.c.googlers.com/

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-06 20:46                       ` Ackerley Tng
@ 2025-05-08 14:12                         ` Sean Christopherson
  2025-05-08 14:46                         ` David Hildenbrand
  2025-05-09 21:04                         ` James Houghton
  2 siblings, 0 replies; 63+ messages in thread
From: Sean Christopherson @ 2025-05-08 14:12 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: David Hildenbrand, Vishal Annapurve, Fuad Tabba, kvm,
	linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta

On Tue, May 06, 2025, Ackerley Tng wrote:
> Sean, David, I'm circling back to make sure I'm following the discussion
> correctly before Fuad sends out the next revision of this series.

Honestly, just send the next version.  Try to review a description of code is an
exercise in frustration.  More versions of a series isn't inherently bad.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-06 20:46                       ` Ackerley Tng
  2025-05-08 14:12                         ` Sean Christopherson
@ 2025-05-08 14:46                         ` David Hildenbrand
  2025-05-09 21:04                         ` James Houghton
  2 siblings, 0 replies; 63+ messages in thread
From: David Hildenbrand @ 2025-05-08 14:46 UTC (permalink / raw)
  To: Ackerley Tng, Sean Christopherson, Vishal Annapurve
  Cc: Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, jthoughton, peterx, pankaj.gupta

>>
>> Yeah, I ignored that fact as well. So essentially, this patch should be
>> mostly good for now.
>>
> 
>  From here [1], these changes will make it to v9
> 
> + kvm_max_private_mapping_level renaming to kvm_max_gmem_mapping_level
> + kvm_mmu_faultin_pfn_private renaming to kvm_mmu_faultin_pfn_gmem
> 
>> Only kvm_mmu_hugepage_adjust() must be taught to not rely on
>> fault->is_private.
>>
> 
> I think fault->is_private should contribute to determining the max
> mapping level.
> 
> By the time kvm_mmu_hugepage_adjust() is called,
> 
> * For Coco VMs using guest_memfd only for private memory,
>    * fault->is_private would have been checked to align with
>      kvm->mem_attr_array, so
> * For Coco VMs using guest_memfd for both private/shared memory,
>    * fault->is_private would have been checked to align with
>      guest_memfd's shareability
> * For non-Coco VMs using guest_memfd
>    * fault->is_private would be false

But as Sean said, looking at the code might be easier.

Maybe just send the resulting diff of the patch here real quick?

> 
> Hence fault->is_private can be relied on when calling
> kvm_mmu_hugepage_adjust().
> 
> If fault->is_private, there will be no host userspace mapping to check,
> hence in __kvm_mmu_max_mapping_level(), we should skip querying host
> page tables.
> 
> If !fault->is_private, for shared memory ranges, if the VM uses
> guest_memfd only for shared memory, we should query host page tables.
> 
> If !fault->is_private, for shared memory ranges, if the VM uses
> guest_memfd for both shared/private memory, we should not query host
> page tables.
 > > If !fault->is_private, for non-Coco VMs, we should not query host page
> tables.
 > > I propose to rename the parameter is_private to skip_host_page_tables,
> so
> 
> - if (is_private)
> + if (skip_host_page_tables)
> 	return max_level;
> 
> and pass
> 
> skip_host_page_tables = fault->is_private ||
> 			kvm_gmem_memslot_supports_shared(fault->slot);
> 

How is that better than calling it "is_gmem" / "from_gmem" etc? :) 
Anyhow, no strong opinion, spelling out that something is from gmem 
implies that we don't care about page tables.

> where kvm_gmem_memslot_supports_shared() checks the inode in the memslot
> for GUEST_MEMFD_FLAG_SUPPORT_SHARED.

Makes sense.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults
  2025-04-30 16:56 ` [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
@ 2025-05-09 20:15   ` James Houghton
  2025-05-12  7:07     ` Fuad Tabba
  0 siblings, 1 reply; 63+ messages in thread
From: James Houghton @ 2025-05-09 20:15 UTC (permalink / raw)
  To: tabba
  Cc: ackerleytng, akpm, amoorthy, anup, aou, brauner, catalin.marinas,
	chao.p.peng, chenhuacai, david, dmatlack, fvdl, hch, hughd,
	isaku.yamahata, isaku.yamahata, james.morse, jarkko, jgg,
	jhubbard, jthoughton, keirf, kirill.shutemov, kvm, liam.merwick,
	linux-arm-msm, linux-mm, mail, maz, mic, michael.roth, mpe,
	oliver.upton, palmer, pankaj.gupta, paul.walmsley, pbonzini,
	peterx, qperret, quic_cvanscha, quic_eberman, quic_mnalajal,
	quic_pderrin, quic_pheragu, quic_svaddagi, quic_tsoni, rientjes,
	roypat, seanjc, shuah, steven.price, suzuki.poulose, vannapurve,
	vbabka, viro, wei.w.wang, will, willy, xiaoyao.li, yilun.xu,
	yuzenghui

On Wed, Apr 30, 2025 at 9:57 AM Fuad Tabba <tabba@google.com> wrote:
>
> Add arm64 support for handling guest page faults on guest_memfd
> backed memslots.
>
> For now, the fault granule is restricted to PAGE_SIZE.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/kvm/mmu.c     | 65 +++++++++++++++++++++++++++-------------
>  include/linux/kvm_host.h |  5 ++++
>  virt/kvm/kvm_main.c      |  5 ----
>  3 files changed, 50 insertions(+), 25 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 148a97c129de..d1044c7f78bb 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1466,6 +1466,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
>         return vma->vm_flags & VM_MTE_ALLOWED;
>  }
>
> +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> +                            gfn_t gfn, bool write_fault, bool *writable,
> +                            struct page **page, bool is_gmem)
> +{
> +       kvm_pfn_t pfn;
> +       int ret;
> +
> +       if (!is_gmem)
> +               return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
> +
> +       *writable = false;
> +
> +       ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
> +       if (!ret) {
> +               *writable = !memslot_is_readonly(slot);
> +               return pfn;
> +       }
> +
> +       if (ret == -EHWPOISON)
> +               return KVM_PFN_ERR_HWPOISON;
> +
> +       return KVM_PFN_ERR_NOSLOT_MASK;
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                           struct kvm_s2_trans *nested,
>                           struct kvm_memory_slot *memslot, unsigned long hva,
> @@ -1473,19 +1497,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  {
>         int ret = 0;
>         bool write_fault, writable;
> -       bool exec_fault, mte_allowed;
> +       bool exec_fault, mte_allowed = false;
>         bool device = false, vfio_allow_any_uc = false;
>         unsigned long mmu_seq;
>         phys_addr_t ipa = fault_ipa;
>         struct kvm *kvm = vcpu->kvm;
> -       struct vm_area_struct *vma;
> +       struct vm_area_struct *vma = NULL;
>         short vma_shift;
>         void *memcache;
> -       gfn_t gfn;
> +       gfn_t gfn = ipa >> PAGE_SHIFT;
>         kvm_pfn_t pfn;
>         bool logging_active = memslot_is_logging(memslot);
> -       bool force_pte = logging_active || is_protected_kvm_enabled();
> -       long vma_pagesize, fault_granule;
> +       bool is_gmem = kvm_slot_has_gmem(memslot) && kvm_mem_from_gmem(kvm, gfn);
> +       bool force_pte = logging_active || is_gmem || is_protected_kvm_enabled();
> +       long vma_pagesize, fault_granule = PAGE_SIZE;
>         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>         struct kvm_pgtable *pgt;
>         struct page *page;
> @@ -1522,16 +1547,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                         return ret;
>         }
>
> +       mmap_read_lock(current->mm);

We don't have to take the mmap_lock for gmem faults, right?

I think we should reorganize user_mem_abort() a bit (and I think vma_pagesize
and maybe vma_shift should be renamed) given the changes we're making here.

Below is a diff that I think might be a little cleaner. Let me know what you
think.

> +
>         /*
>          * Let's check if we will get back a huge page backed by hugetlbfs, or
>          * get block mapping for device MMIO region.
>          */
> -       mmap_read_lock(current->mm);
> -       vma = vma_lookup(current->mm, hva);
> -       if (unlikely(!vma)) {
> -               kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> -               mmap_read_unlock(current->mm);
> -               return -EFAULT;
> +       if (!is_gmem) {
> +               vma = vma_lookup(current->mm, hva);
> +               if (unlikely(!vma)) {
> +                       kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> +                       mmap_read_unlock(current->mm);
> +                       return -EFAULT;
> +               }
> +
> +               vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> +               mte_allowed = kvm_vma_mte_allowed(vma);
>         }
>
>         if (force_pte)
> @@ -1602,18 +1633,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                 ipa &= ~(vma_pagesize - 1);
>         }
>
> -       gfn = ipa >> PAGE_SHIFT;
> -       mte_allowed = kvm_vma_mte_allowed(vma);
> -
> -       vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> -
>         /* Don't use the VMA after the unlock -- it may have vanished */
>         vma = NULL;
>
>         /*
>          * Read mmu_invalidate_seq so that KVM can detect if the results of
> -        * vma_lookup() or __kvm_faultin_pfn() become stale prior to
> -        * acquiring kvm->mmu_lock.
> +        * vma_lookup() or faultin_pfn() become stale prior to acquiring
> +        * kvm->mmu_lock.
>          *
>          * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
>          * with the smp_wmb() in kvm_mmu_invalidate_end().
> @@ -1621,8 +1647,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         mmu_seq = vcpu->kvm->mmu_invalidate_seq;
>         mmap_read_unlock(current->mm);
>
> -       pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
> -                               &writable, &page);
> +       pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
>         if (pfn == KVM_PFN_ERR_HWPOISON) {

I think we need to take care to handle HWPOISON properly. I know that it is
(or will most likely be) the case that GUP(hva) --> pfn, but with gmem,
it *might* not be the case. So the following line isn't right.

I think we need to handle HWPOISON for gmem using memory fault exits instead of
sending a SIGBUS to userspace. This would be consistent with how KVM/x86
today handles getting a HWPOISON page back from kvm_gmem_get_pfn(). I'm not
entirely sure how KVM/x86 is meant to handle HWPOISON on shared gmem pages yet;
I need to keep reading your series.

The reorganization diff below leaves this unfixed.

>                 kvm_send_hwpoison_signal(hva, vma_shift);
>                 return 0;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f3af6bff3232..1b2e4e9a7802 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1882,6 +1882,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
>         return gfn_to_memslot(kvm, gfn)->id;
>  }
>
> +static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot)
> +{
> +       return slot->flags & KVM_MEM_READONLY;
> +}
> +
>  static inline gfn_t
>  hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
>  {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c75d8e188eb7..d9bca5ba19dc 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2640,11 +2640,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
>         return size;
>  }
>
> -static bool memslot_is_readonly(const struct kvm_memory_slot *slot)
> -{
> -       return slot->flags & KVM_MEM_READONLY;
> -}
> -
>  static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn,
>                                        gfn_t *nr_pages, bool write)
>  {
> --
> 2.49.0.901.g37484f566f-goog

Thanks, Fuad! Here's the reorganization/rename diff:

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d1044c7f78bba..c9eb72fe9013b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1502,7 +1502,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	unsigned long mmu_seq;
 	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
-	struct vm_area_struct *vma = NULL;
 	short vma_shift;
 	void *memcache;
 	gfn_t gfn = ipa >> PAGE_SHIFT;
@@ -1510,7 +1509,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	bool logging_active = memslot_is_logging(memslot);
 	bool is_gmem = kvm_slot_has_gmem(memslot) && kvm_mem_from_gmem(kvm, gfn);
 	bool force_pte = logging_active || is_gmem || is_protected_kvm_enabled();
-	long vma_pagesize, fault_granule = PAGE_SIZE;
+	long target_size = PAGE_SIZE, fault_granule = PAGE_SIZE;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 	struct page *page;
@@ -1547,13 +1546,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			return ret;
 	}
 
-	mmap_read_lock(current->mm);
-
 	/*
 	 * Let's check if we will get back a huge page backed by hugetlbfs, or
 	 * get block mapping for device MMIO region.
 	 */
 	if (!is_gmem) {
+		struct vm_area_struct *vma = NULL;
+
+		mmap_read_lock(current->mm);
+
 		vma = vma_lookup(current->mm, hva);
 		if (unlikely(!vma)) {
 			kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
@@ -1563,38 +1564,45 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 		vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
 		mte_allowed = kvm_vma_mte_allowed(vma);
-	}
-
-	if (force_pte)
-		vma_shift = PAGE_SHIFT;
-	else
-		vma_shift = get_vma_page_shift(vma, hva);
+		vma_shift = force_pte ? get_vma_page_shift(vma, hva) : PAGE_SHIFT;
 
-	switch (vma_shift) {
+		switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
-	case PUD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
-			break;
-		fallthrough;
+		case PUD_SHIFT:
+			if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
+				break;
+			fallthrough;
 #endif
-	case CONT_PMD_SHIFT:
-		vma_shift = PMD_SHIFT;
-		fallthrough;
-	case PMD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
+		case CONT_PMD_SHIFT:
+			vma_shift = PMD_SHIFT;
+			fallthrough;
+		case PMD_SHIFT:
+			if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
+				break;
+			fallthrough;
+		case CONT_PTE_SHIFT:
+			vma_shift = PAGE_SHIFT;
+			force_pte = true;
+			fallthrough;
+		case PAGE_SHIFT:
 			break;
-		fallthrough;
-	case CONT_PTE_SHIFT:
-		vma_shift = PAGE_SHIFT;
-		force_pte = true;
-		fallthrough;
-	case PAGE_SHIFT:
-		break;
-	default:
-		WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
-	}
+		default:
+			WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
+		}
 
-	vma_pagesize = 1UL << vma_shift;
+		/*
+		 * Read mmu_invalidate_seq so that KVM can detect if the results of
+		 * vma_lookup() or faultin_pfn() become stale prior to acquiring
+		 * kvm->mmu_lock.
+		 *
+		 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
+		 * with the smp_wmb() in kvm_mmu_invalidate_end().
+		 */
+		mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+		mmap_read_unlock(current->mm);
+
+		target_size = 1UL << vma_shift;
+	}
 
 	if (nested) {
 		unsigned long max_map_size;
@@ -1620,7 +1628,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			max_map_size = PAGE_SIZE;
 
 		force_pte = (max_map_size == PAGE_SIZE);
-		vma_pagesize = min(vma_pagesize, (long)max_map_size);
+		target_size = min(target_size, (long)max_map_size);
 	}
 
 	/*
@@ -1628,27 +1636,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * ensure we find the right PFN and lay down the mapping in the right
 	 * place.
 	 */
-	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
-		fault_ipa &= ~(vma_pagesize - 1);
-		ipa &= ~(vma_pagesize - 1);
+	if (target_size == PMD_SIZE || target_size == PUD_SIZE) {
+		fault_ipa &= ~(target_size - 1);
+		ipa &= ~(target_size - 1);
 	}
 
-	/* Don't use the VMA after the unlock -- it may have vanished */
-	vma = NULL;
-
-	/*
-	 * Read mmu_invalidate_seq so that KVM can detect if the results of
-	 * vma_lookup() or faultin_pfn() become stale prior to acquiring
-	 * kvm->mmu_lock.
-	 *
-	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
-	 * with the smp_wmb() in kvm_mmu_invalidate_end().
-	 */
-	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
-	mmap_read_unlock(current->mm);
-
 	pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
+		// TODO: Handle gmem properly. vma_shift
+		// intentionally left uninitialized.
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
 	}
@@ -1658,9 +1654,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (kvm_is_device_pfn(pfn)) {
 		/*
 		 * If the page was identified as device early by looking at
-		 * the VMA flags, vma_pagesize is already representing the
+		 * the VMA flags, target_size is already representing the
 		 * largest quantity we can map.  If instead it was mapped
-		 * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
+		 * via __kvm_faultin_pfn(), target_size is set to PAGE_SIZE
 		 * and must not be upgraded.
 		 *
 		 * In both cases, we don't let transparent_hugepage_adjust()
@@ -1699,7 +1695,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	kvm_fault_lock(kvm);
 	pgt = vcpu->arch.hw_mmu->pgt;
-	if (mmu_invalidate_retry(kvm, mmu_seq)) {
+	if (!is_gmem && mmu_invalidate_retry(kvm, mmu_seq)) {
 		ret = -EAGAIN;
 		goto out_unlock;
 	}
@@ -1708,16 +1704,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
-	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
+	if (target_size == PAGE_SIZE && !(force_pte || device)) {
 		if (fault_is_perm && fault_granule > PAGE_SIZE)
-			vma_pagesize = fault_granule;
-		else
-			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
+			target_size = fault_granule;
+		else if (!is_gmem)
+			target_size = transparent_hugepage_adjust(kvm, memslot,
 								   hva, &pfn,
 								   &fault_ipa);
 
-		if (vma_pagesize < 0) {
-			ret = vma_pagesize;
+		if (target_size < 0) {
+			ret = target_size;
 			goto out_unlock;
 		}
 	}
@@ -1725,7 +1721,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (!fault_is_perm && !device && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
 		if (mte_allowed) {
-			sanitise_mte_tags(kvm, pfn, vma_pagesize);
+			sanitise_mte_tags(kvm, pfn, target_size);
 		} else {
 			ret = -EFAULT;
 			goto out_unlock;
@@ -1750,10 +1746,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	/*
 	 * Under the premise of getting a FSC_PERM fault, we just need to relax
-	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
+	 * permissions only if target_size equals fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault_is_perm && vma_pagesize == fault_granule) {
+	if (fault_is_perm && target_size == fault_granule) {
 		/*
 		 * Drop the SW bits in favour of those stored in the
 		 * PTE, which will be preserved.
@@ -1761,7 +1757,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		prot &= ~KVM_NV_GUEST_MAP_SZ;
 		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags);
 	} else {
-		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, vma_pagesize,
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, target_size,
 					     __pfn_to_phys(pfn), prot,
 					     memcache, flags);
 	}

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
                     ` (3 preceding siblings ...)
  2025-05-05 21:06   ` Ira Weiny
@ 2025-05-09 20:54   ` James Houghton
  2025-05-11  8:03     ` David Hildenbrand
  4 siblings, 1 reply; 63+ messages in thread
From: James Houghton @ 2025-05-09 20:54 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta

On Wed, Apr 30, 2025 at 9:57 AM Fuad Tabba <tabba@google.com> wrote:
> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +       struct kvm_gmem *gmem = file->private_data;
> +
> +       if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
> +               return -ENODEV;
> +
> +       if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> +           (VM_SHARED | VM_MAYSHARE)) {
> +               return -EINVAL;
> +       }
> +
> +       vm_flags_set(vma, VM_DONTDUMP);

Hi Fuad,

Sorry if I missed this, but why exactly do we set VM_DONTDUMP here?
Could you leave a small comment? (I see that it seems to have
originally come from Patrick? [1]) I get that guest memory VMAs
generally should have VM_DONTDUMP; is there a bigger reason?

[1]: https://lore.kernel.org/kvm/20240709132041.3625501-9-roypat@amazon.co.uk/#t

> +       vma->vm_ops = &kvm_gmem_vm_ops;
> +
> +       return 0;
> +}

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-06 20:46                       ` Ackerley Tng
  2025-05-08 14:12                         ` Sean Christopherson
  2025-05-08 14:46                         ` David Hildenbrand
@ 2025-05-09 21:04                         ` James Houghton
  2025-05-09 22:29                           ` David Hildenbrand
  2 siblings, 1 reply; 63+ messages in thread
From: James Houghton @ 2025-05-09 21:04 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: David Hildenbrand, Sean Christopherson, Vishal Annapurve,
	Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai,
	mpe, anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, peterx, pankaj.gupta

On Tue, May 6, 2025 at 1:47 PM Ackerley Tng <ackerleytng@google.com> wrote:
> From here [1], these changes will make it to v9
>
> + kvm_max_private_mapping_level renaming to kvm_max_gmem_mapping_level
> + kvm_mmu_faultin_pfn_private renaming to kvm_mmu_faultin_pfn_gmem
>
> > Only kvm_mmu_hugepage_adjust() must be taught to not rely on
> > fault->is_private.
> >
>
> I think fault->is_private should contribute to determining the max
> mapping level.
>
> By the time kvm_mmu_hugepage_adjust() is called,
>
> * For Coco VMs using guest_memfd only for private memory,
>   * fault->is_private would have been checked to align with
>     kvm->mem_attr_array, so
> * For Coco VMs using guest_memfd for both private/shared memory,
>   * fault->is_private would have been checked to align with
>     guest_memfd's shareability
> * For non-Coco VMs using guest_memfd
>   * fault->is_private would be false

I'm not sure exactly which thread to respond to, but it seems like the
idea now is to have a *VM* flag determine if shared faults use gmem or
use the user mappings. It seems more natural for that to be a property
of the memslot / a *memslot* flag.

Sean, Fuad, what do you think? I don't see any downsides, and it seems
strictly more flexible.

> [1] https://lore.kernel.org/all/20250430165655.605595-7-tabba@google.com/
> [2] https://lore.kernel.org/all/diqz1pt1sfw8.fsf@ackerleytng-ctop.c.googlers.com/

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 11/13] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-04-30 16:56 ` [PATCH v8 11/13] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
@ 2025-05-09 21:08   ` James Houghton
  2025-05-12  6:55     ` Fuad Tabba
  0 siblings, 1 reply; 63+ messages in thread
From: James Houghton @ 2025-05-09 21:08 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta

On Wed, Apr 30, 2025 at 9:57 AM Fuad Tabba <tabba@google.com> wrote:
> +#ifdef CONFIG_KVM_GMEM
> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> +{
> +       return IS_ENABLED(CONFIG_KVM_GMEM);

How about just `return true;`? :)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-09 21:04                         ` James Houghton
@ 2025-05-09 22:29                           ` David Hildenbrand
  2025-05-09 22:38                             ` James Houghton
  0 siblings, 1 reply; 63+ messages in thread
From: David Hildenbrand @ 2025-05-09 22:29 UTC (permalink / raw)
  To: James Houghton, Ackerley Tng
  Cc: Sean Christopherson, Vishal Annapurve, Fuad Tabba, kvm,
	linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, peterx, pankaj.gupta

On 09.05.25 23:04, James Houghton wrote:
> On Tue, May 6, 2025 at 1:47 PM Ackerley Tng <ackerleytng@google.com> wrote:
>>  From here [1], these changes will make it to v9
>>
>> + kvm_max_private_mapping_level renaming to kvm_max_gmem_mapping_level
>> + kvm_mmu_faultin_pfn_private renaming to kvm_mmu_faultin_pfn_gmem
>>
>>> Only kvm_mmu_hugepage_adjust() must be taught to not rely on
>>> fault->is_private.
>>>
>>
>> I think fault->is_private should contribute to determining the max
>> mapping level.
>>
>> By the time kvm_mmu_hugepage_adjust() is called,
>>
>> * For Coco VMs using guest_memfd only for private memory,
>>    * fault->is_private would have been checked to align with
>>      kvm->mem_attr_array, so
>> * For Coco VMs using guest_memfd for both private/shared memory,
>>    * fault->is_private would have been checked to align with
>>      guest_memfd's shareability
>> * For non-Coco VMs using guest_memfd
>>    * fault->is_private would be false
> 
> I'm not sure exactly which thread to respond to, but it seems like the
> idea now is to have a *VM* flag determine if shared faults use gmem or
> use the user mappings. It seems more natural for that to be a property
> of the memslot / a *memslot* flag.

I think that's exactly what we discussed in the last meetings. The 
guest_memfd flag essentially defines that.

So it's not strictly a memslot flag but rather a guest_memfd flag, and 
the memslot is configured with that guest_memfd, inheriting that flag.

There might be a VM capability, whether it supports creation of these 
new guest_memfds (iow, guest_memfd understands the new flag).

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
  2025-05-09 22:29                           ` David Hildenbrand
@ 2025-05-09 22:38                             ` James Houghton
  0 siblings, 0 replies; 63+ messages in thread
From: James Houghton @ 2025-05-09 22:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Ackerley Tng, Sean Christopherson, Vishal Annapurve, Fuad Tabba,
	kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, mail, michael.roth, wei.w.wang,
	liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
	steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
	quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
	catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
	qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
	hughd, peterx, pankaj.gupta

On Fri, May 9, 2025 at 3:29 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 09.05.25 23:04, James Houghton wrote:
> > On Tue, May 6, 2025 at 1:47 PM Ackerley Tng <ackerleytng@google.com> wrote:
> >>  From here [1], these changes will make it to v9
> >>
> >> + kvm_max_private_mapping_level renaming to kvm_max_gmem_mapping_level
> >> + kvm_mmu_faultin_pfn_private renaming to kvm_mmu_faultin_pfn_gmem
> >>
> >>> Only kvm_mmu_hugepage_adjust() must be taught to not rely on
> >>> fault->is_private.
> >>>
> >>
> >> I think fault->is_private should contribute to determining the max
> >> mapping level.
> >>
> >> By the time kvm_mmu_hugepage_adjust() is called,
> >>
> >> * For Coco VMs using guest_memfd only for private memory,
> >>    * fault->is_private would have been checked to align with
> >>      kvm->mem_attr_array, so
> >> * For Coco VMs using guest_memfd for both private/shared memory,
> >>    * fault->is_private would have been checked to align with
> >>      guest_memfd's shareability
> >> * For non-Coco VMs using guest_memfd
> >>    * fault->is_private would be false
> >
> > I'm not sure exactly which thread to respond to, but it seems like the
> > idea now is to have a *VM* flag determine if shared faults use gmem or
> > use the user mappings. It seems more natural for that to be a property
> > of the memslot / a *memslot* flag.
>
> I think that's exactly what we discussed in the last meetings. The
> guest_memfd flag essentially defines that.
>
> So it's not strictly a memslot flag but rather a guest_memfd flag, and
> the memslot is configured with that guest_memfd, inheriting that flag.
>
> There might be a VM capability, whether it supports creation of these
> new guest_memfds (iow, guest_memfd understands the new flag).

Oh yeah, I remember now, thanks for clearing that up for me. And I can
see it in the notes from last week's guest_memfd meeting. :)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-09 20:54   ` James Houghton
@ 2025-05-11  8:03     ` David Hildenbrand
  2025-05-12  7:08       ` Fuad Tabba
  2025-05-12  7:46       ` Roy, Patrick
  0 siblings, 2 replies; 63+ messages in thread
From: David Hildenbrand @ 2025-05-11  8:03 UTC (permalink / raw)
  To: James Houghton, Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta

On 09.05.25 22:54, James Houghton wrote:
> On Wed, Apr 30, 2025 at 9:57 AM Fuad Tabba <tabba@google.com> wrote:
>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> +       struct kvm_gmem *gmem = file->private_data;
>> +
>> +       if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
>> +               return -ENODEV;
>> +
>> +       if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
>> +           (VM_SHARED | VM_MAYSHARE)) {
>> +               return -EINVAL;
>> +       }
>> +
>> +       vm_flags_set(vma, VM_DONTDUMP);
> 
> Hi Fuad,
> 
> Sorry if I missed this, but why exactly do we set VM_DONTDUMP here?
> Could you leave a small comment? (I see that it seems to have
> originally come from Patrick? [1]) I get that guest memory VMAs
> generally should have VM_DONTDUMP; is there a bigger reason?

(David replying)

I assume because we might have inaccessible parts in there that SIGBUS 
on access.

get_dump_page() does ignore any errors, though (returning NULL), so 
likely we don't need VM_DONTDUMP.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 11/13] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-05-09 21:08   ` James Houghton
@ 2025-05-12  6:55     ` Fuad Tabba
  0 siblings, 0 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-05-12  6:55 UTC (permalink / raw)
  To: James Houghton
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
	hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx, pankaj.gupta

On Fri, 9 May 2025 at 22:08, James Houghton <jthoughton@google.com> wrote:
>
> On Wed, Apr 30, 2025 at 9:57 AM Fuad Tabba <tabba@google.com> wrote:
> > +#ifdef CONFIG_KVM_GMEM
> > +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> > +{
> > +       return IS_ENABLED(CONFIG_KVM_GMEM);
>
> How about just `return true;`? :)

Ack.

Thanks!
/fuad

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults
  2025-05-09 20:15   ` James Houghton
@ 2025-05-12  7:07     ` Fuad Tabba
  0 siblings, 0 replies; 63+ messages in thread
From: Fuad Tabba @ 2025-05-12  7:07 UTC (permalink / raw)
  To: James Houghton
  Cc: ackerleytng, akpm, amoorthy, anup, aou, brauner, catalin.marinas,
	chao.p.peng, chenhuacai, david, dmatlack, fvdl, hch, hughd,
	isaku.yamahata, isaku.yamahata, james.morse, jarkko, jgg,
	jhubbard, keirf, kirill.shutemov, kvm, liam.merwick,
	linux-arm-msm, linux-mm, mail, maz, mic, michael.roth, mpe,
	oliver.upton, palmer, pankaj.gupta, paul.walmsley, pbonzini,
	peterx, qperret, quic_cvanscha, quic_eberman, quic_mnalajal,
	quic_pderrin, quic_pheragu, quic_svaddagi, quic_tsoni, rientjes,
	roypat, seanjc, shuah, steven.price, suzuki.poulose, vannapurve,
	vbabka, viro, wei.w.wang, will, willy, xiaoyao.li, yilun.xu,
	yuzenghui

Hi James,


On Fri, 9 May 2025 at 21:15, James Houghton <jthoughton@google.com> wrote:
>
> On Wed, Apr 30, 2025 at 9:57 AM Fuad Tabba <tabba@google.com> wrote:
> >
> > Add arm64 support for handling guest page faults on guest_memfd
> > backed memslots.
> >
> > For now, the fault granule is restricted to PAGE_SIZE.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/kvm/mmu.c     | 65 +++++++++++++++++++++++++++-------------
> >  include/linux/kvm_host.h |  5 ++++
> >  virt/kvm/kvm_main.c      |  5 ----
> >  3 files changed, 50 insertions(+), 25 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 148a97c129de..d1044c7f78bb 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1466,6 +1466,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
> >         return vma->vm_flags & VM_MTE_ALLOWED;
> >  }
> >
> > +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> > +                            gfn_t gfn, bool write_fault, bool *writable,
> > +                            struct page **page, bool is_gmem)
> > +{
> > +       kvm_pfn_t pfn;
> > +       int ret;
> > +
> > +       if (!is_gmem)
> > +               return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
> > +
> > +       *writable = false;
> > +
> > +       ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
> > +       if (!ret) {
> > +               *writable = !memslot_is_readonly(slot);
> > +               return pfn;
> > +       }
> > +
> > +       if (ret == -EHWPOISON)
> > +               return KVM_PFN_ERR_HWPOISON;
> > +
> > +       return KVM_PFN_ERR_NOSLOT_MASK;
> > +}
> > +
> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                           struct kvm_s2_trans *nested,
> >                           struct kvm_memory_slot *memslot, unsigned long hva,
> > @@ -1473,19 +1497,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >  {
> >         int ret = 0;
> >         bool write_fault, writable;
> > -       bool exec_fault, mte_allowed;
> > +       bool exec_fault, mte_allowed = false;
> >         bool device = false, vfio_allow_any_uc = false;
> >         unsigned long mmu_seq;
> >         phys_addr_t ipa = fault_ipa;
> >         struct kvm *kvm = vcpu->kvm;
> > -       struct vm_area_struct *vma;
> > +       struct vm_area_struct *vma = NULL;
> >         short vma_shift;
> >         void *memcache;
> > -       gfn_t gfn;
> > +       gfn_t gfn = ipa >> PAGE_SHIFT;
> >         kvm_pfn_t pfn;
> >         bool logging_active = memslot_is_logging(memslot);
> > -       bool force_pte = logging_active || is_protected_kvm_enabled();
> > -       long vma_pagesize, fault_granule;
> > +       bool is_gmem = kvm_slot_has_gmem(memslot) && kvm_mem_from_gmem(kvm, gfn);
> > +       bool force_pte = logging_active || is_gmem || is_protected_kvm_enabled();
> > +       long vma_pagesize, fault_granule = PAGE_SIZE;
> >         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> >         struct kvm_pgtable *pgt;
> >         struct page *page;
> > @@ -1522,16 +1547,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                         return ret;
> >         }
> >
> > +       mmap_read_lock(current->mm);
>
> We don't have to take the mmap_lock for gmem faults, right?
>
> I think we should reorganize user_mem_abort() a bit (and I think vma_pagesize
> and maybe vma_shift should be renamed) given the changes we're making here.

Good point.

> Below is a diff that I think might be a little cleaner. Let me know what you
> think.
>
> > +
> >         /*
> >          * Let's check if we will get back a huge page backed by hugetlbfs, or
> >          * get block mapping for device MMIO region.
> >          */
> > -       mmap_read_lock(current->mm);
> > -       vma = vma_lookup(current->mm, hva);
> > -       if (unlikely(!vma)) {
> > -               kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> > -               mmap_read_unlock(current->mm);
> > -               return -EFAULT;
> > +       if (!is_gmem) {
> > +               vma = vma_lookup(current->mm, hva);
> > +               if (unlikely(!vma)) {
> > +                       kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> > +                       mmap_read_unlock(current->mm);
> > +                       return -EFAULT;
> > +               }
> > +
> > +               vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> > +               mte_allowed = kvm_vma_mte_allowed(vma);
> >         }
> >
> >         if (force_pte)
> > @@ -1602,18 +1633,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                 ipa &= ~(vma_pagesize - 1);
> >         }
> >
> > -       gfn = ipa >> PAGE_SHIFT;
> > -       mte_allowed = kvm_vma_mte_allowed(vma);
> > -
> > -       vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> > -
> >         /* Don't use the VMA after the unlock -- it may have vanished */
> >         vma = NULL;
> >
> >         /*
> >          * Read mmu_invalidate_seq so that KVM can detect if the results of
> > -        * vma_lookup() or __kvm_faultin_pfn() become stale prior to
> > -        * acquiring kvm->mmu_lock.
> > +        * vma_lookup() or faultin_pfn() become stale prior to acquiring
> > +        * kvm->mmu_lock.
> >          *
> >          * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
> >          * with the smp_wmb() in kvm_mmu_invalidate_end().
> > @@ -1621,8 +1647,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >         mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> >         mmap_read_unlock(current->mm);
> >
> > -       pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
> > -                               &writable, &page);
> > +       pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
> >         if (pfn == KVM_PFN_ERR_HWPOISON) {
>
> I think we need to take care to handle HWPOISON properly. I know that it is
> (or will most likely be) the case that GUP(hva) --> pfn, but with gmem,
> it *might* not be the case. So the following line isn't right.
>
> I think we need to handle HWPOISON for gmem using memory fault exits instead of
> sending a SIGBUS to userspace. This would be consistent with how KVM/x86
> today handles getting a HWPOISON page back from kvm_gmem_get_pfn(). I'm not
> entirely sure how KVM/x86 is meant to handle HWPOISON on shared gmem pages yet;
> I need to keep reading your series.

You're right. In the next respin (coming soon), Ackerley has added a
patch that performs a best-effort check to ensure that hva matches the
gfn.

> The reorganization diff below leaves this unfixed.
>
> >                 kvm_send_hwpoison_signal(hva, vma_shift);
> >                 return 0;
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index f3af6bff3232..1b2e4e9a7802 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1882,6 +1882,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
> >         return gfn_to_memslot(kvm, gfn)->id;
> >  }
> >
> > +static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot)
> > +{
> > +       return slot->flags & KVM_MEM_READONLY;
> > +}
> > +
> >  static inline gfn_t
> >  hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
> >  {
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index c75d8e188eb7..d9bca5ba19dc 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2640,11 +2640,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
> >         return size;
> >  }
> >
> > -static bool memslot_is_readonly(const struct kvm_memory_slot *slot)
> > -{
> > -       return slot->flags & KVM_MEM_READONLY;
> > -}
> > -
> >  static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn,
> >                                        gfn_t *nr_pages, bool write)
> >  {
> > --
> > 2.49.0.901.g37484f566f-goog
>
> Thanks, Fuad! Here's the reorganization/rename diff:

Thank you James. This is very helpful.

Cheers,
/fuad

>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d1044c7f78bba..c9eb72fe9013b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1502,7 +1502,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         unsigned long mmu_seq;
>         phys_addr_t ipa = fault_ipa;
>         struct kvm *kvm = vcpu->kvm;
> -       struct vm_area_struct *vma = NULL;
>         short vma_shift;
>         void *memcache;
>         gfn_t gfn = ipa >> PAGE_SHIFT;
> @@ -1510,7 +1509,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         bool logging_active = memslot_is_logging(memslot);
>         bool is_gmem = kvm_slot_has_gmem(memslot) && kvm_mem_from_gmem(kvm, gfn);
>         bool force_pte = logging_active || is_gmem || is_protected_kvm_enabled();
> -       long vma_pagesize, fault_granule = PAGE_SIZE;
> +       long target_size = PAGE_SIZE, fault_granule = PAGE_SIZE;
>         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>         struct kvm_pgtable *pgt;
>         struct page *page;
> @@ -1547,13 +1546,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                         return ret;
>         }
>
> -       mmap_read_lock(current->mm);
> -
>         /*
>          * Let's check if we will get back a huge page backed by hugetlbfs, or
>          * get block mapping for device MMIO region.
>          */
>         if (!is_gmem) {
> +               struct vm_area_struct *vma = NULL;
> +
> +               mmap_read_lock(current->mm);
> +
>                 vma = vma_lookup(current->mm, hva);
>                 if (unlikely(!vma)) {
>                         kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> @@ -1563,38 +1564,45 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
>                 vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
>                 mte_allowed = kvm_vma_mte_allowed(vma);
> -       }
> -
> -       if (force_pte)
> -               vma_shift = PAGE_SHIFT;
> -       else
> -               vma_shift = get_vma_page_shift(vma, hva);
> +               vma_shift = force_pte ? get_vma_page_shift(vma, hva) : PAGE_SHIFT;
>
> -       switch (vma_shift) {
> +               switch (vma_shift) {
>  #ifndef __PAGETABLE_PMD_FOLDED
> -       case PUD_SHIFT:
> -               if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
> -                       break;
> -               fallthrough;
> +               case PUD_SHIFT:
> +                       if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
> +                               break;
> +                       fallthrough;
>  #endif
> -       case CONT_PMD_SHIFT:
> -               vma_shift = PMD_SHIFT;
> -               fallthrough;
> -       case PMD_SHIFT:
> -               if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
> +               case CONT_PMD_SHIFT:
> +                       vma_shift = PMD_SHIFT;
> +                       fallthrough;
> +               case PMD_SHIFT:
> +                       if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
> +                               break;
> +                       fallthrough;
> +               case CONT_PTE_SHIFT:
> +                       vma_shift = PAGE_SHIFT;
> +                       force_pte = true;
> +                       fallthrough;
> +               case PAGE_SHIFT:
>                         break;
> -               fallthrough;
> -       case CONT_PTE_SHIFT:
> -               vma_shift = PAGE_SHIFT;
> -               force_pte = true;
> -               fallthrough;
> -       case PAGE_SHIFT:
> -               break;
> -       default:
> -               WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
> -       }
> +               default:
> +                       WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
> +               }
>
> -       vma_pagesize = 1UL << vma_shift;
> +               /*
> +                * Read mmu_invalidate_seq so that KVM can detect if the results of
> +                * vma_lookup() or faultin_pfn() become stale prior to acquiring
> +                * kvm->mmu_lock.
> +                *
> +                * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
> +                * with the smp_wmb() in kvm_mmu_invalidate_end().
> +                */
> +               mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> +               mmap_read_unlock(current->mm);
> +
> +               target_size = 1UL << vma_shift;
> +       }
>
>         if (nested) {
>                 unsigned long max_map_size;
> @@ -1620,7 +1628,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                         max_map_size = PAGE_SIZE;
>
>                 force_pte = (max_map_size == PAGE_SIZE);
> -               vma_pagesize = min(vma_pagesize, (long)max_map_size);
> +               target_size = min(target_size, (long)max_map_size);
>         }
>
>         /*
> @@ -1628,27 +1636,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          * ensure we find the right PFN and lay down the mapping in the right
>          * place.
>          */
> -       if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
> -               fault_ipa &= ~(vma_pagesize - 1);
> -               ipa &= ~(vma_pagesize - 1);
> +       if (target_size == PMD_SIZE || target_size == PUD_SIZE) {
> +               fault_ipa &= ~(target_size - 1);
> +               ipa &= ~(target_size - 1);
>         }
>
> -       /* Don't use the VMA after the unlock -- it may have vanished */
> -       vma = NULL;
> -
> -       /*
> -        * Read mmu_invalidate_seq so that KVM can detect if the results of
> -        * vma_lookup() or faultin_pfn() become stale prior to acquiring
> -        * kvm->mmu_lock.
> -        *
> -        * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
> -        * with the smp_wmb() in kvm_mmu_invalidate_end().
> -        */
> -       mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> -       mmap_read_unlock(current->mm);
> -
>         pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
>         if (pfn == KVM_PFN_ERR_HWPOISON) {
> +               // TODO: Handle gmem properly. vma_shift
> +               // intentionally left uninitialized.
>                 kvm_send_hwpoison_signal(hva, vma_shift);
>                 return 0;
>         }
> @@ -1658,9 +1654,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         if (kvm_is_device_pfn(pfn)) {
>                 /*
>                  * If the page was identified as device early by looking at
> -                * the VMA flags, vma_pagesize is already representing the
> +                * the VMA flags, target_size is already representing the
>                  * largest quantity we can map.  If instead it was mapped
> -                * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
> +                * via __kvm_faultin_pfn(), target_size is set to PAGE_SIZE
>                  * and must not be upgraded.
>                  *
>                  * In both cases, we don't let transparent_hugepage_adjust()
> @@ -1699,7 +1695,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
>         kvm_fault_lock(kvm);
>         pgt = vcpu->arch.hw_mmu->pgt;
> -       if (mmu_invalidate_retry(kvm, mmu_seq)) {
> +       if (!is_gmem && mmu_invalidate_retry(kvm, mmu_seq)) {
>                 ret = -EAGAIN;
>                 goto out_unlock;
>         }
> @@ -1708,16 +1704,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          * If we are not forced to use page mapping, check if we are
>          * backed by a THP and thus use block mapping if possible.
>          */
> -       if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
> +       if (target_size == PAGE_SIZE && !(force_pte || device)) {
>                 if (fault_is_perm && fault_granule > PAGE_SIZE)
> -                       vma_pagesize = fault_granule;
> -               else
> -                       vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
> +                       target_size = fault_granule;
> +               else if (!is_gmem)
> +                       target_size = transparent_hugepage_adjust(kvm, memslot,
>                                                                    hva, &pfn,
>                                                                    &fault_ipa);
>
> -               if (vma_pagesize < 0) {
> -                       ret = vma_pagesize;
> +               if (target_size < 0) {
> +                       ret = target_size;
>                         goto out_unlock;
>                 }
>         }
> @@ -1725,7 +1721,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         if (!fault_is_perm && !device && kvm_has_mte(kvm)) {
>                 /* Check the VMM hasn't introduced a new disallowed VMA */
>                 if (mte_allowed) {
> -                       sanitise_mte_tags(kvm, pfn, vma_pagesize);
> +                       sanitise_mte_tags(kvm, pfn, target_size);
>                 } else {
>                         ret = -EFAULT;
>                         goto out_unlock;
> @@ -1750,10 +1746,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
>         /*
>          * Under the premise of getting a FSC_PERM fault, we just need to relax
> -        * permissions only if vma_pagesize equals fault_granule. Otherwise,
> +        * permissions only if target_size equals fault_granule. Otherwise,
>          * kvm_pgtable_stage2_map() should be called to change block size.
>          */
> -       if (fault_is_perm && vma_pagesize == fault_granule) {
> +       if (fault_is_perm && target_size == fault_granule) {
>                 /*
>                  * Drop the SW bits in favour of those stored in the
>                  * PTE, which will be preserved.
> @@ -1761,7 +1757,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                 prot &= ~KVM_NV_GUEST_MAP_SZ;
>                 ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags);
>         } else {
> -               ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, vma_pagesize,
> +               ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, target_size,
>                                              __pfn_to_phys(pfn), prot,
>                                              memcache, flags);
>         }

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-11  8:03     ` David Hildenbrand
@ 2025-05-12  7:08       ` Fuad Tabba
  2025-05-12 19:29         ` James Houghton
  2025-05-12  7:46       ` Roy, Patrick
  1 sibling, 1 reply; 63+ messages in thread
From: Fuad Tabba @ 2025-05-12  7:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: James Houghton, kvm, linux-arm-msm, linux-mm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, seanjc, viro,
	brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
	pankaj.gupta

Hi James.

On Sun, 11 May 2025 at 09:03, David Hildenbrand <david@redhat.com> wrote:
>
> On 09.05.25 22:54, James Houghton wrote:
> > On Wed, Apr 30, 2025 at 9:57 AM Fuad Tabba <tabba@google.com> wrote:
> >> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> >> +{
> >> +       struct kvm_gmem *gmem = file->private_data;
> >> +
> >> +       if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
> >> +               return -ENODEV;
> >> +
> >> +       if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> >> +           (VM_SHARED | VM_MAYSHARE)) {
> >> +               return -EINVAL;
> >> +       }
> >> +
> >> +       vm_flags_set(vma, VM_DONTDUMP);
> >
> > Hi Fuad,
> >
> > Sorry if I missed this, but why exactly do we set VM_DONTDUMP here?
> > Could you leave a small comment? (I see that it seems to have
> > originally come from Patrick? [1]) I get that guest memory VMAs
> > generally should have VM_DONTDUMP; is there a bigger reason?
>
> (David replying)
>
> I assume because we might have inaccessible parts in there that SIGBUS
> on access.

That was my thinking.

> get_dump_page() does ignore any errors, though (returning NULL), so
> likely we don't need VM_DONTDUMP.

In which case I'll remove this from the next respin.

Thanks,
/fuad

> --
> Cheers,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-11  8:03     ` David Hildenbrand
  2025-05-12  7:08       ` Fuad Tabba
@ 2025-05-12  7:46       ` Roy, Patrick
  1 sibling, 0 replies; 63+ messages in thread
From: Roy, Patrick @ 2025-05-12  7:46 UTC (permalink / raw)
  To: david@redhat.com
  Cc: ackerleytng@google.com, akpm@linux-foundation.org,
	amoorthy@google.com, anup@brainfault.org, aou@eecs.berkeley.edu,
	brauner@kernel.org, catalin.marinas@arm.com,
	chao.p.peng@linux.intel.com, chenhuacai@kernel.org,
	dmatlack@google.com, fvdl@google.com, hch@infradead.org,
	hughd@google.com, isaku.yamahata@gmail.com,
	isaku.yamahata@intel.com, james.morse@arm.com, jarkko@kernel.org,
	jgg@nvidia.com, jhubbard@nvidia.com, jthoughton@google.com,
	keirf@google.com, kirill.shutemov@linux.intel.com,
	kvm@vger.kernel.org, liam.merwick@oracle.com,
	linux-arm-msm@vger.kernel.org, linux-mm@kvack.org,
	mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net,
	michael.roth@amd.com, mpe@ellerman.id.au, oliver.upton@linux.dev,
	palmer@dabbelt.com, pankaj.gupta@amd.com,
	paul.walmsley@sifive.com, pbonzini@redhat.com, peterx@redhat.com,
	qperret@google.com, quic_cvanscha@quicinc.com,
	quic_eberman@quicinc.com, quic_mnalajal@quicinc.com,
	quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com,
	rientjes@google.com, Roy, Patrick, seanjc@google.com,
	shuah@kernel.org, steven.price@arm.com, suzuki.poulose@arm.com,
	tabba@google.com, vannapurve@google.com, vbabka@suse.cz,
	viro@zeniv.linux.org.uk, wei.w.wang@intel.com, will@kernel.org,
	willy@infradead.org, xiaoyao.li@intel.com, yilun.xu@intel.com,
	yuzenghui@huawei.com

On Sun, 2025-05-11 at 09:03 +0100, David Hildenbrand wrote:
>>>                return -ENODEV;
>>> +
>>> +       if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
>>> +           (VM_SHARED | VM_MAYSHARE)) {
>>> +               return -EINVAL;
>>> +       }
>>> +
>>> +       vm_flags_set(vma, VM_DONTDUMP);
>>
>> Hi Fuad,
>>
>> Sorry if I missed this, but why exactly do we set VM_DONTDUMP here?
>> Could you leave a small comment? (I see that it seems to have
>> originally come from Patrick? [1]) I get that guest memory VMAs
>> generally should have VM_DONTDUMP; is there a bigger reason?

Iirc, I essentially copied my mmap handler from secretmem for that RFC. But
even for direct map removal, it seems this is not needed, because get_dump_page
goes via GUP, which errors out for direct map removed VMAs. So what David is
saying below also applies in that case.
 
> (David replying)
> 
> I assume because we might have inaccessible parts in there that SIGBUS
> on access.
> 
> get_dump_page() does ignore any errors, though (returning NULL), so
> likely we don't need VM_DONTDUMP.
> 
> -- 
> Cheers,
> 
> David / dhildenb

Best,
Patrick

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-05-12  7:08       ` Fuad Tabba
@ 2025-05-12 19:29         ` James Houghton
  0 siblings, 0 replies; 63+ messages in thread
From: James Houghton @ 2025-05-12 19:29 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: David Hildenbrand, kvm, linux-arm-msm, linux-mm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, seanjc, viro,
	brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
	ackerleytng, mail, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
	pankaj.gupta

On Mon, May 12, 2025 at 12:09 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi James.
>
> On Sun, 11 May 2025 at 09:03, David Hildenbrand <david@redhat.com> wrote:
> >
> > On 09.05.25 22:54, James Houghton wrote:
> > > On Wed, Apr 30, 2025 at 9:57 AM Fuad Tabba <tabba@google.com> wrote:
> > >> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> > >> +{
> > >> +       struct kvm_gmem *gmem = file->private_data;
> > >> +
> > >> +       if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
> > >> +               return -ENODEV;
> > >> +
> > >> +       if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> > >> +           (VM_SHARED | VM_MAYSHARE)) {
> > >> +               return -EINVAL;
> > >> +       }
> > >> +
> > >> +       vm_flags_set(vma, VM_DONTDUMP);
> > >
> > > Hi Fuad,
> > >
> > > Sorry if I missed this, but why exactly do we set VM_DONTDUMP here?
> > > Could you leave a small comment? (I see that it seems to have
> > > originally come from Patrick? [1]) I get that guest memory VMAs
> > > generally should have VM_DONTDUMP; is there a bigger reason?
> >
> > (David replying)
> >
> > I assume because we might have inaccessible parts in there that SIGBUS
> > on access.
>
> That was my thinking.
>
> > get_dump_page() does ignore any errors, though (returning NULL), so
> > likely we don't need VM_DONTDUMP.
>
> In which case I'll remove this from the next respin.

SGTM, thanks!

Userspace could remove VM_DONTDUMP by doing MADV_DODUMP, which is why
I was curious about this.

And thanks for the extra context[1], Patrick. :)

[1]: https://lore.kernel.org/kvm/20250512074615.27394-1-roypat@amazon.co.uk/

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2025-05-12 19:30 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 01/13] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-05-01 17:38   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-05-01 18:10   ` Ira Weiny
2025-05-02  6:44     ` David Hildenbrand
2025-05-02 14:24       ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 03/13] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-05-01 18:18   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 04/13] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-05-01 18:19   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 05/13] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-05-01 21:37   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups Fuad Tabba
2025-04-30 18:58   ` Ackerley Tng
2025-05-01  9:53     ` Fuad Tabba
2025-05-02 15:04     ` David Hildenbrand
2025-05-02 16:21       ` Sean Christopherson
2025-05-02 22:00         ` Ackerley Tng
2025-05-05  8:01           ` David Hildenbrand
2025-05-05 22:57             ` Sean Christopherson
2025-05-06  5:17               ` Vishal Annapurve
2025-05-06  5:28                 ` Vishal Annapurve
2025-05-06 13:58                   ` Sean Christopherson
2025-05-06 14:15                     ` David Hildenbrand
2025-05-06 20:46                       ` Ackerley Tng
2025-05-08 14:12                         ` Sean Christopherson
2025-05-08 14:46                         ` David Hildenbrand
2025-05-09 21:04                         ` James Houghton
2025-05-09 22:29                           ` David Hildenbrand
2025-05-09 22:38                             ` James Houghton
2025-05-06 19:27               ` Ackerley Tng
2025-05-05 23:09             ` Ackerley Tng
2025-05-05 23:17               ` Sean Christopherson
2025-05-01 21:38   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-04-30 21:30   ` David Hildenbrand
2025-05-01 21:43   ` Ira Weiny
2025-05-02 12:07     ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
2025-04-30 21:33   ` David Hildenbrand
2025-05-01  8:07     ` Fuad Tabba
2025-05-02 15:11   ` David Hildenbrand
2025-05-02 22:06     ` Ackerley Tng
2025-05-02 22:29   ` Ackerley Tng
2025-05-06  8:47     ` Yan Zhao
2025-05-05 21:06   ` Ira Weiny
2025-05-06 12:15     ` Fuad Tabba
2025-05-09 20:54   ` James Houghton
2025-05-11  8:03     ` David Hildenbrand
2025-05-12  7:08       ` Fuad Tabba
2025-05-12 19:29         ` James Houghton
2025-05-12  7:46       ` Roy, Patrick
2025-04-30 16:56 ` [PATCH v8 09/13] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-04-30 21:35   ` David Hildenbrand
2025-04-30 16:56 ` [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-05-09 20:15   ` James Houghton
2025-05-12  7:07     ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 11/13] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
2025-05-09 21:08   ` James Houghton
2025-05-12  6:55     ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 12/13] KVM: x86: KVM_X86_SW_PROTECTED_VM to support guest_memfd shared memory Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 13/13] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).