[PATCH 0/6] KVM: Avoid a lurking guest

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess
@ 2025-09-26 16:31 Sean Christopherson
  2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
                   ` (5 more replies)
  0 siblings, 6 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-26 16:31 UTC (permalink / raw)
  To: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson, Ackerley Tng

Add a guest_memfd flag, DEFAULT_SHARED, to let userspace explicitly state
whether the underlying memory should default to private vs. shared.  As-is,
the default state is implicitly derived from the MMAP flag: guest_memfd
without MMAP is private, and with MMAP is shared.  That implicit behavior
is going to create a mess of an ABI once in-place conversion support comes
along.

If the default state is implicit, then x86 CoCo VMs will end up with default
state that varies based on whether or not a guest_memfd instance is
configured for mmap() support.  To avoid breaking guest<=>host ABI for CoCo
VMs when utilizing in-place conversion, i.e. MMAP, userspace would need to
immediately convert all memory from shared=>private.

Ackerley's RFC for in-place conversion fudged around this by adding a flag
to let userspace set the default to _private_, but that will result in a
messy and hard to document ABI.  For x86 CoCo VMs, memory would be private
by default, unless MMAP but not INIT_PRIVATE is specified.  For everything
else, memory would be shared by default, sort of?  Because without MMAP,
the memory would be inaccessible, leading to Schrödinger's cat situation.

Since odds are very good we'll end up with a flag of some kind, add one now
(for 6.18) so that the default state is explicit and simple: without
DEFAULT_SHARED == private, with DEFAULT_SHARED == shared.

As a bonus, this allows for adding test coverage that KVM rejects faults to
private memory.

Ackerley Tng (1):
  KVM: selftests: Add test coverage for guest_memfd without
    GUEST_MEMFD_FLAG_MMAP

Sean Christopherson (5):
  KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if
    not set
  KVM: selftests: Stash the host page size in a global in the
    guest_memfd test
  KVM: selftests: Create a new guest_memfd for each testcase
  KVM: selftests: Add wrappers for mmap() and munmap() to assert success
  KVM: selftests: Verify that faulting in private guest_memfd memory
    fails

 Documentation/virt/kvm/api.rst                |  10 +-
 include/uapi/linux/kvm.h                      |   3 +-
 .../testing/selftests/kvm/guest_memfd_test.c  | 162 +++++++++++-------
 .../testing/selftests/kvm/include/kvm_util.h  |  25 +++
 tools/testing/selftests/kvm/lib/kvm_util.c    |  44 ++---
 tools/testing/selftests/kvm/mmu_stress_test.c |   5 +-
 .../selftests/kvm/s390/ucontrol_test.c        |  16 +-
 .../selftests/kvm/set_memory_region_test.c    |  17 +-
 virt/kvm/guest_memfd.c                        |   6 +-
 9 files changed, 169 insertions(+), 119 deletions(-)

base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
-- 
2.51.0.536.g15c5d4f767-goog

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
@ 2025-09-26 16:31 ` Sean Christopherson
  2025-09-29  8:38   ` David Hildenbrand
  2025-09-29  9:04   ` Fuad Tabba
  2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-26 16:31 UTC (permalink / raw)
  To: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson, Ackerley Tng

Add a guest_memfd flag to allow userspace to state that the underlying
memory should be configured to be shared by default, and reject user page
faults if the guest_memfd instance's memory isn't shared by default.
Because KVM doesn't yet support in-place private<=>shared conversions, all
guest_memfd memory effectively follows the default state.

Alternatively, KVM could deduce the default state based on MMAP, which for
all intents and purposes is what KVM currently does.  However, implicitly
deriving the default state based on MMAP will result in a messy ABI when
support for in-place conversions is added.

For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
by default (otherwise the memory would be unusable).  If MMAP implies
memory is shared by default, then the default state for CoCo VMs will vary
based on MMAP, and from userspace's perspective, will change when in-place
conversion support is added.  I.e. to maintain guest<=>host ABI, userspace
would need to immediately convert all memory from shared=>private, which
is both ugly and inefficient.  The inefficiency could be avoided by adding
a flag to state that memory is _private_ by default, irrespective of MMAP,
but that would lead to an equally messy and hard to document ABI.

Bite the bullet and immediately add a flag to control the default state so
that the effective behavior is explicit and straightforward.

Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
Cc: David Hildenbrand <david@redhat.com>
Cc: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/api.rst                 | 10 ++++++++--
 include/uapi/linux/kvm.h                       |  3 ++-
 tools/testing/selftests/kvm/guest_memfd_test.c |  5 +++--
 virt/kvm/guest_memfd.c                         |  6 +++++-
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index c17a87a0a5ac..4dfe156bbe3c 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6415,8 +6415,14 @@ guest_memfd range is not allowed (any number of memory regions can be bound to
 a single guest_memfd file, but the bound ranges must not overlap).
 
 When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
-supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
-enables mmap() and faulting of guest_memfd memory to host userspace.
+supports GUEST_MEMFD_FLAG_MMAP and  GUEST_MEMFD_FLAG_DEFAULT_SHARED.  Setting
+the MMAP flag on guest_memfd creation enables mmap() and faulting of guest_memfd
+memory to host userspace (so long as the memory is currently shared).  Setting
+DEFAULT_SHARED makes all guest_memfd memory shared by default (versus private
+by default).  Note!  Because KVM doesn't yet support in-place private<=>shared
+conversions, DEFAULT_SHARED must be specified in order to fault memory into
+userspace page tables.  This limitation will go away when in-place conversions
+are supported.
 
 When the KVM MMU performs a PFN lookup to service a guest fault and the backing
 guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6efa98a57ec1..38a2c083b6aa 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
 #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
 
 #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
-#define GUEST_MEMFD_FLAG_MMAP	(1ULL << 0)
+#define GUEST_MEMFD_FLAG_MMAP		(1ULL << 0)
+#define GUEST_MEMFD_FLAG_DEFAULT_SHARED	(1ULL << 1)
 
 struct kvm_create_guest_memfd {
 	__u64 size;
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index b3ca6737f304..81b11a958c7a 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -274,7 +274,7 @@ static void test_guest_memfd(unsigned long vm_type)
 	vm = vm_create_barebones_type(vm_type);
 
 	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
-		flags |= GUEST_MEMFD_FLAG_MMAP;
+		flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
 
 	test_create_guest_memfd_multiple(vm);
 	test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
@@ -337,7 +337,8 @@ static void test_guest_memfd_guest(void)
 		    "Default VM type should always support guest_memfd mmap()");
 
 	size = vm->page_size;
-	fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP);
+	fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP |
+					     GUEST_MEMFD_FLAG_DEFAULT_SHARED);
 	vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
 
 	mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 08a6bc7d25b6..19f05a45be04 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -328,6 +328,9 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
 	if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
 		return VM_FAULT_SIGBUS;
 
+	if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_DEFAULT_SHARED))
+		return VM_FAULT_SIGBUS;
+
 	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
 	if (IS_ERR(folio)) {
 		int err = PTR_ERR(folio);
@@ -525,7 +528,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
 	u64 valid_flags = 0;
 
 	if (kvm_arch_supports_gmem_mmap(kvm))
-		valid_flags |= GUEST_MEMFD_FLAG_MMAP;
+		valid_flags |= GUEST_MEMFD_FLAG_MMAP |
+			       GUEST_MEMFD_FLAG_DEFAULT_SHARED;
 
 	if (flags & ~valid_flags)
 		return -EINVAL;
-- 
2.51.0.536.g15c5d4f767-goog


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test
  2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
  2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
@ 2025-09-26 16:31 ` Sean Christopherson
  2025-09-29  9:12   ` Fuad Tabba
                     ` (2 more replies)
  2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-26 16:31 UTC (permalink / raw)
  To: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson, Ackerley Tng

Use a global variable to track the host page size in the guest_memfd test
so that the information doesn't need to be constantly passed around.  The
state is purely a reflection of the underlying system, i.e. can't be set
by the test and is constant for a given invocation of the test, and thus
explicitly passing the host page size to individual testcases adds no
value, e.g. doesn't allow testing different combinations.

Making page_size a global will simplify an upcoming change to create a new
guest_memfd instance per testcase.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 37 +++++++++----------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 81b11a958c7a..8251d019206a 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -24,6 +24,8 @@
 #include "test_util.h"
 #include "ucall_common.h"
 
+static size_t page_size;
+
 static void test_file_read_write(int fd)
 {
 	char buf[64];
@@ -38,7 +40,7 @@ static void test_file_read_write(int fd)
 		    "pwrite on a guest_mem fd should fail");
 }
 
-static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
+static void test_mmap_supported(int fd, size_t total_size)
 {
 	const char val = 0xaa;
 	char *mem;
@@ -78,7 +80,7 @@ void fault_sigbus_handler(int signum)
 	siglongjmp(jmpbuf, 1);
 }
 
-static void test_fault_overflow(int fd, size_t page_size, size_t total_size)
+static void test_fault_overflow(int fd, size_t total_size)
 {
 	struct sigaction sa_old, sa_new = {
 		.sa_handler = fault_sigbus_handler,
@@ -106,7 +108,7 @@ static void test_fault_overflow(int fd, size_t page_size, size_t total_size)
 	TEST_ASSERT(!ret, "munmap() should succeed.");
 }
 
-static void test_mmap_not_supported(int fd, size_t page_size, size_t total_size)
+static void test_mmap_not_supported(int fd, size_t total_size)
 {
 	char *mem;
 
@@ -117,7 +119,7 @@ static void test_mmap_not_supported(int fd, size_t page_size, size_t total_size)
 	TEST_ASSERT_EQ(mem, MAP_FAILED);
 }
 
-static void test_file_size(int fd, size_t page_size, size_t total_size)
+static void test_file_size(int fd, size_t total_size)
 {
 	struct stat sb;
 	int ret;
@@ -128,7 +130,7 @@ static void test_file_size(int fd, size_t page_size, size_t total_size)
 	TEST_ASSERT_EQ(sb.st_blksize, page_size);
 }
 
-static void test_fallocate(int fd, size_t page_size, size_t total_size)
+static void test_fallocate(int fd, size_t total_size)
 {
 	int ret;
 
@@ -165,7 +167,7 @@ static void test_fallocate(int fd, size_t page_size, size_t total_size)
 	TEST_ASSERT(!ret, "fallocate to restore punched hole should succeed");
 }
 
-static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
+static void test_invalid_punch_hole(int fd, size_t total_size)
 {
 	struct {
 		off_t offset;
@@ -196,8 +198,7 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
 }
 
 static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
-						  uint64_t guest_memfd_flags,
-						  size_t page_size)
+						  uint64_t guest_memfd_flags)
 {
 	size_t size;
 	int fd;
@@ -214,7 +215,6 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 {
 	int fd1, fd2, ret;
 	struct stat st1, st2;
-	size_t page_size = getpagesize();
 
 	fd1 = __vm_create_guest_memfd(vm, page_size, 0);
 	TEST_ASSERT(fd1 != -1, "memfd creation should succeed");
@@ -241,7 +241,6 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 
 static void test_guest_memfd_flags(struct kvm_vm *vm, uint64_t valid_flags)
 {
-	size_t page_size = getpagesize();
 	uint64_t flag;
 	int fd;
 
@@ -265,10 +264,8 @@ static void test_guest_memfd(unsigned long vm_type)
 	uint64_t flags = 0;
 	struct kvm_vm *vm;
 	size_t total_size;
-	size_t page_size;
 	int fd;
 
-	page_size = getpagesize();
 	total_size = page_size * 4;
 
 	vm = vm_create_barebones_type(vm_type);
@@ -277,22 +274,22 @@ static void test_guest_memfd(unsigned long vm_type)
 		flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
 
 	test_create_guest_memfd_multiple(vm);
-	test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
+	test_create_guest_memfd_invalid_sizes(vm, flags);
 
 	fd = vm_create_guest_memfd(vm, total_size, flags);
 
 	test_file_read_write(fd);
 
 	if (flags & GUEST_MEMFD_FLAG_MMAP) {
-		test_mmap_supported(fd, page_size, total_size);
-		test_fault_overflow(fd, page_size, total_size);
+		test_mmap_supported(fd, total_size);
+		test_fault_overflow(fd, total_size);
 	} else {
-		test_mmap_not_supported(fd, page_size, total_size);
+		test_mmap_not_supported(fd, total_size);
 	}
 
-	test_file_size(fd, page_size, total_size);
-	test_fallocate(fd, page_size, total_size);
-	test_invalid_punch_hole(fd, page_size, total_size);
+	test_file_size(fd, total_size);
+	test_fallocate(fd, total_size);
+	test_invalid_punch_hole(fd, total_size);
 
 	test_guest_memfd_flags(vm, flags);
 
@@ -367,6 +364,8 @@ int main(int argc, char *argv[])
 
 	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
 
+	page_size = getpagesize();
+
 	/*
 	 * Not all architectures support KVM_CAP_VM_TYPES. However, those that
 	 * support guest_memfd have that support for the default VM type.
-- 
2.51.0.536.g15c5d4f767-goog


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase
  2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
  2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
  2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
@ 2025-09-26 16:31 ` Sean Christopherson
  2025-09-29  9:18   ` David Hildenbrand
                     ` (2 more replies)
  2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-26 16:31 UTC (permalink / raw)
  To: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson, Ackerley Tng

Refactor the guest_memfd selftest to improve test isolation by creating a
a new guest_memfd for each testcase.  Currently, the test reuses a single
guest_memfd instance for all testcases, and thus creates dependencies
between tests, e.g. not truncating folios from the guest_memfd instance
at the end of a test could lead to unexpected results (see the PUNCH_HOLE
purging that needs to done by in-flight the NUMA testcases[1]).

Invoke each test via a macro wrapper to create and close a guest_memfd
to cut down on the boilerplate copy+paste needed to create a test.

Link: https://lore.kernel.org/all/20250827175247.83322-10-shivankg@amd.com
Reported-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 31 ++++++++++---------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 8251d019206a..60c6dec63490 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -26,7 +26,7 @@
 
 static size_t page_size;
 
-static void test_file_read_write(int fd)
+static void test_file_read_write(int fd, size_t total_size)
 {
 	char buf[64];
 
@@ -259,14 +259,18 @@ static void test_guest_memfd_flags(struct kvm_vm *vm, uint64_t valid_flags)
 	}
 }
 
+#define gmem_test(__test, __vm, __flags)				\
+do {									\
+	int fd = vm_create_guest_memfd(__vm, page_size * 4, __flags);	\
+									\
+	test_##__test(fd, page_size * 4);				\
+	close(fd);							\
+} while (0)
+
 static void test_guest_memfd(unsigned long vm_type)
 {
 	uint64_t flags = 0;
 	struct kvm_vm *vm;
-	size_t total_size;
-	int fd;
-
-	total_size = page_size * 4;
 
 	vm = vm_create_barebones_type(vm_type);
 
@@ -276,24 +280,21 @@ static void test_guest_memfd(unsigned long vm_type)
 	test_create_guest_memfd_multiple(vm);
 	test_create_guest_memfd_invalid_sizes(vm, flags);
 
-	fd = vm_create_guest_memfd(vm, total_size, flags);
-
-	test_file_read_write(fd);
+	gmem_test(file_read_write, vm, flags);
 
 	if (flags & GUEST_MEMFD_FLAG_MMAP) {
-		test_mmap_supported(fd, total_size);
-		test_fault_overflow(fd, total_size);
+		gmem_test(mmap_supported, vm, flags);
+		gmem_test(fault_overflow, vm, flags);
 	} else {
-		test_mmap_not_supported(fd, total_size);
+		gmem_test(mmap_not_supported, vm, flags);
 	}
 
-	test_file_size(fd, total_size);
-	test_fallocate(fd, total_size);
-	test_invalid_punch_hole(fd, total_size);
+	gmem_test(file_size, vm, flags);
+	gmem_test(fallocate, vm, flags);
+	gmem_test(invalid_punch_hole, vm, flags);
 
 	test_guest_memfd_flags(vm, flags);
 
-	close(fd);
 	kvm_vm_free(vm);
 }
 
-- 
2.51.0.536.g15c5d4f767-goog


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP
  2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
                   ` (2 preceding siblings ...)
  2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
@ 2025-09-26 16:31 ` Sean Christopherson
  2025-09-29  9:21   ` David Hildenbrand
  2025-09-29  9:24   ` Fuad Tabba
  2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
  2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
  5 siblings, 2 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-26 16:31 UTC (permalink / raw)
  To: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson, Ackerley Tng

From: Ackerley Tng <ackerleytng@google.com>

If a VM type supports KVM_CAP_GUEST_MEMFD_MMAP, the guest_memfd test will
run all test cases with GUEST_MEMFD_FLAG_MMAP set.  This leaves the code
path for creating a non-mmap()-able guest_memfd on a VM that supports
mappable guest memfds untested.

Refactor the test to run the main test suite with a given set of flags.
Then, for VM types that support the mappable capability, invoke the test
suite twice: once with no flags, and once with GUEST_MEMFD_FLAG_MMAP
set.

This ensures both creation paths are properly exercised on capable VMs.

test_guest_memfd_flags() tests valid flags, hence it can be run just once
per VM type, and valid flag identification can be moved into the test
function.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
[sean: use double-underscores for the inner helper]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 30 ++++++++++++-------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 60c6dec63490..5a50a28ce1fa 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -239,11 +239,16 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 	close(fd1);
 }
 
-static void test_guest_memfd_flags(struct kvm_vm *vm, uint64_t valid_flags)
+static void test_guest_memfd_flags(struct kvm_vm *vm)
 {
+	uint64_t valid_flags = 0;
 	uint64_t flag;
 	int fd;
 
+	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
+		valid_flags |= GUEST_MEMFD_FLAG_MMAP |
+			       GUEST_MEMFD_FLAG_DEFAULT_SHARED;
+
 	for (flag = BIT(0); flag; flag <<= 1) {
 		fd = __vm_create_guest_memfd(vm, page_size, flag);
 		if (flag & valid_flags) {
@@ -267,16 +272,8 @@ do {									\
 	close(fd);							\
 } while (0)
 
-static void test_guest_memfd(unsigned long vm_type)
+static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
 {
-	uint64_t flags = 0;
-	struct kvm_vm *vm;
-
-	vm = vm_create_barebones_type(vm_type);
-
-	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
-		flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
-
 	test_create_guest_memfd_multiple(vm);
 	test_create_guest_memfd_invalid_sizes(vm, flags);
 
@@ -292,8 +289,19 @@ static void test_guest_memfd(unsigned long vm_type)
 	gmem_test(file_size, vm, flags);
 	gmem_test(fallocate, vm, flags);
 	gmem_test(invalid_punch_hole, vm, flags);
+}
 
-	test_guest_memfd_flags(vm, flags);
+static void test_guest_memfd(unsigned long vm_type)
+{
+	struct kvm_vm *vm = vm_create_barebones_type(vm_type);
+
+	test_guest_memfd_flags(vm);
+
+	__test_guest_memfd(vm, 0);
+
+	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
+		__test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP |
+				       GUEST_MEMFD_FLAG_DEFAULT_SHARED);
 
 	kvm_vm_free(vm);
 }
-- 
2.51.0.536.g15c5d4f767-goog


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success
  2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
                   ` (3 preceding siblings ...)
  2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
@ 2025-09-26 16:31 ` Sean Christopherson
  2025-09-29  9:24   ` Fuad Tabba
                     ` (2 more replies)
  2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
  5 siblings, 3 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-26 16:31 UTC (permalink / raw)
  To: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson, Ackerley Tng

Add and use wrappers for mmap() and munmap() that assert success to reduce
a significant amount of boilerplate code, to ensure all tests assert on
failure, and to provide consistent error messages on failure.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 21 +++------
 .../testing/selftests/kvm/include/kvm_util.h  | 25 +++++++++++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 44 +++++++------------
 tools/testing/selftests/kvm/mmu_stress_test.c |  5 +--
 .../selftests/kvm/s390/ucontrol_test.c        | 16 +++----
 .../selftests/kvm/set_memory_region_test.c    | 17 ++++---
 6 files changed, 64 insertions(+), 64 deletions(-)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 5a50a28ce1fa..5dd40b77dc07 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -50,8 +50,7 @@ static void test_mmap_supported(int fd, size_t total_size)
 	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
 	TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
 
-	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-	TEST_ASSERT(mem != MAP_FAILED, "mmap() for guest_memfd should succeed.");
+	mem = kvm_mmap(total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
 
 	memset(mem, val, total_size);
 	for (i = 0; i < total_size; i++)
@@ -70,8 +69,7 @@ static void test_mmap_supported(int fd, size_t total_size)
 	for (i = 0; i < total_size; i++)
 		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
 
-	ret = munmap(mem, total_size);
-	TEST_ASSERT(!ret, "munmap() should succeed.");
+	kvm_munmap(mem, total_size);
 }
 
 static sigjmp_buf jmpbuf;
@@ -89,10 +87,8 @@ static void test_fault_overflow(int fd, size_t total_size)
 	const char val = 0xaa;
 	char *mem;
 	size_t i;
-	int ret;
 
-	mem = mmap(NULL, map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-	TEST_ASSERT(mem != MAP_FAILED, "mmap() for guest_memfd should succeed.");
+	mem = kvm_mmap(map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
 
 	sigaction(SIGBUS, &sa_new, &sa_old);
 	if (sigsetjmp(jmpbuf, 1) == 0) {
@@ -104,8 +100,7 @@ static void test_fault_overflow(int fd, size_t total_size)
 	for (i = 0; i < total_size; i++)
 		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
 
-	ret = munmap(mem, map_size);
-	TEST_ASSERT(!ret, "munmap() should succeed.");
+	kvm_munmap(mem, map_size);
 }
 
 static void test_mmap_not_supported(int fd, size_t total_size)
@@ -347,10 +342,9 @@ static void test_guest_memfd_guest(void)
 					     GUEST_MEMFD_FLAG_DEFAULT_SHARED);
 	vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
 
-	mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-	TEST_ASSERT(mem != MAP_FAILED, "mmap() on guest_memfd failed");
+	mem = kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
 	memset(mem, 0xaa, size);
-	munmap(mem, size);
+	kvm_munmap(mem, size);
 
 	virt_pg_map(vm, gpa, gpa);
 	vcpu_args_set(vcpu, 2, gpa, size);
@@ -358,8 +352,7 @@ static void test_guest_memfd_guest(void)
 
 	TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
 
-	mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-	TEST_ASSERT(mem != MAP_FAILED, "mmap() on guest_memfd failed");
+	mem = kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
 	for (i = 0; i < size; i++)
 		TEST_ASSERT_EQ(mem[i], 0xff);
 
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 23a506d7eca3..1c68ff0fb3fb 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -278,6 +278,31 @@ static inline bool kvm_has_cap(long cap)
 #define __KVM_SYSCALL_ERROR(_name, _ret) \
 	"%s failed, rc: %i errno: %i (%s)", (_name), (_ret), errno, strerror(errno)
 
+static inline void *__kvm_mmap(size_t size, int prot, int flags, int fd,
+			       off_t offset)
+{
+	void *mem;
+
+	mem = mmap(NULL, size, prot, flags, fd, offset);
+	TEST_ASSERT(mem != MAP_FAILED, __KVM_SYSCALL_ERROR("mmap()",
+		    (int)(unsigned long)MAP_FAILED));
+
+	return mem;
+}
+
+static inline void *kvm_mmap(size_t size, int prot, int flags, int fd)
+{
+	return __kvm_mmap(size, prot, flags, fd, 0);
+}
+
+static inline void kvm_munmap(void *mem, size_t size)
+{
+	int ret;
+
+	ret = munmap(mem, size);
+	TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
+}
+
 /*
  * Use the "inner", double-underscore macro when reporting errors from within
  * other macros so that the name of ioctl() and not its literal numeric value
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index c3f5142b0a54..da754b152c11 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -770,13 +770,11 @@ static void vm_vcpu_rm(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
 	int ret;
 
 	if (vcpu->dirty_gfns) {
-		ret = munmap(vcpu->dirty_gfns, vm->dirty_ring_size);
-		TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
+		kvm_munmap(vcpu->dirty_gfns, vm->dirty_ring_size);
 		vcpu->dirty_gfns = NULL;
 	}
 
-	ret = munmap(vcpu->run, vcpu_mmap_sz());
-	TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
+	kvm_munmap(vcpu->run, vcpu_mmap_sz());
 
 	ret = close(vcpu->fd);
 	TEST_ASSERT(!ret,  __KVM_SYSCALL_ERROR("close()", ret));
@@ -810,20 +808,16 @@ void kvm_vm_release(struct kvm_vm *vmp)
 static void __vm_mem_region_delete(struct kvm_vm *vm,
 				   struct userspace_mem_region *region)
 {
-	int ret;
-
 	rb_erase(&region->gpa_node, &vm->regions.gpa_tree);
 	rb_erase(&region->hva_node, &vm->regions.hva_tree);
 	hash_del(&region->slot_node);
 
 	sparsebit_free(&region->unused_phy_pages);
 	sparsebit_free(&region->protected_phy_pages);
-	ret = munmap(region->mmap_start, region->mmap_size);
-	TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
+	kvm_munmap(region->mmap_start, region->mmap_size);
 	if (region->fd >= 0) {
 		/* There's an extra map when using shared memory. */
-		ret = munmap(region->mmap_alias, region->mmap_size);
-		TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
+		kvm_munmap(region->mmap_alias, region->mmap_size);
 		close(region->fd);
 	}
 	if (region->region.guest_memfd >= 0)
@@ -1080,12 +1074,9 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
 		region->fd = kvm_memfd_alloc(region->mmap_size,
 					     src_type == VM_MEM_SRC_SHARED_HUGETLB);
 
-	region->mmap_start = mmap(NULL, region->mmap_size,
-				  PROT_READ | PROT_WRITE,
-				  vm_mem_backing_src_alias(src_type)->flag,
-				  region->fd, 0);
-	TEST_ASSERT(region->mmap_start != MAP_FAILED,
-		    __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED));
+	region->mmap_start = kvm_mmap(region->mmap_size, PROT_READ | PROT_WRITE,
+				      vm_mem_backing_src_alias(src_type)->flag,
+				      region->fd);
 
 	TEST_ASSERT(!is_backing_src_hugetlb(src_type) ||
 		    region->mmap_start == align_ptr_up(region->mmap_start, backing_src_pagesz),
@@ -1156,12 +1147,10 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
 
 	/* If shared memory, create an alias. */
 	if (region->fd >= 0) {
-		region->mmap_alias = mmap(NULL, region->mmap_size,
-					  PROT_READ | PROT_WRITE,
-					  vm_mem_backing_src_alias(src_type)->flag,
-					  region->fd, 0);
-		TEST_ASSERT(region->mmap_alias != MAP_FAILED,
-			    __KVM_SYSCALL_ERROR("mmap()",  (int)(unsigned long)MAP_FAILED));
+		region->mmap_alias = kvm_mmap(region->mmap_size,
+					      PROT_READ | PROT_WRITE,
+					      vm_mem_backing_src_alias(src_type)->flag,
+					      region->fd);
 
 		/* Align host alias address */
 		region->host_alias = align_ptr_up(region->mmap_alias, alignment);
@@ -1371,10 +1360,8 @@ struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id)
 	TEST_ASSERT(vcpu_mmap_sz() >= sizeof(*vcpu->run), "vcpu mmap size "
 		"smaller than expected, vcpu_mmap_sz: %i expected_min: %zi",
 		vcpu_mmap_sz(), sizeof(*vcpu->run));
-	vcpu->run = (struct kvm_run *) mmap(NULL, vcpu_mmap_sz(),
-		PROT_READ | PROT_WRITE, MAP_SHARED, vcpu->fd, 0);
-	TEST_ASSERT(vcpu->run != MAP_FAILED,
-		    __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED));
+	vcpu->run = kvm_mmap(vcpu_mmap_sz(), PROT_READ | PROT_WRITE,
+			     MAP_SHARED, vcpu->fd);
 
 	if (kvm_has_cap(KVM_CAP_BINARY_STATS_FD))
 		vcpu->stats.fd = vcpu_get_stats_fd(vcpu);
@@ -1821,9 +1808,8 @@ void *vcpu_map_dirty_ring(struct kvm_vcpu *vcpu)
 			    page_size * KVM_DIRTY_LOG_PAGE_OFFSET);
 		TEST_ASSERT(addr == MAP_FAILED, "Dirty ring mapped exec");
 
-		addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, vcpu->fd,
-			    page_size * KVM_DIRTY_LOG_PAGE_OFFSET);
-		TEST_ASSERT(addr != MAP_FAILED, "Dirty ring map failed");
+		addr = __kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, vcpu->fd,
+				  page_size * KVM_DIRTY_LOG_PAGE_OFFSET);
 
 		vcpu->dirty_gfns = addr;
 		vcpu->dirty_gfns_count = size / sizeof(struct kvm_dirty_gfn);
diff --git a/tools/testing/selftests/kvm/mmu_stress_test.c b/tools/testing/selftests/kvm/mmu_stress_test.c
index 6a437d2be9fa..37b7e6524533 100644
--- a/tools/testing/selftests/kvm/mmu_stress_test.c
+++ b/tools/testing/selftests/kvm/mmu_stress_test.c
@@ -339,8 +339,7 @@ int main(int argc, char *argv[])
 	TEST_ASSERT(max_gpa > (4 * slot_size), "MAXPHYADDR <4gb ");
 
 	fd = kvm_memfd_alloc(slot_size, hugepages);
-	mem = mmap(NULL, slot_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-	TEST_ASSERT(mem != MAP_FAILED, "mmap() failed");
+	mem = kvm_mmap(slot_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
 
 	TEST_ASSERT(!madvise(mem, slot_size, MADV_NOHUGEPAGE), "madvise() failed");
 
@@ -413,7 +412,7 @@ int main(int argc, char *argv[])
 	for (slot = (slot - 1) & ~1ull; slot >= first_slot; slot -= 2)
 		vm_set_user_memory_region(vm, slot, 0, 0, 0, NULL);
 
-	munmap(mem, slot_size / 2);
+	kvm_munmap(mem, slot_size / 2);
 
 	/* Sanity check that the vCPUs actually ran. */
 	for (i = 0; i < nr_vcpus; i++)
diff --git a/tools/testing/selftests/kvm/s390/ucontrol_test.c b/tools/testing/selftests/kvm/s390/ucontrol_test.c
index d265b34c54be..50bc1c38225a 100644
--- a/tools/testing/selftests/kvm/s390/ucontrol_test.c
+++ b/tools/testing/selftests/kvm/s390/ucontrol_test.c
@@ -142,19 +142,17 @@ FIXTURE_SETUP(uc_kvm)
 	self->kvm_run_size = ioctl(self->kvm_fd, KVM_GET_VCPU_MMAP_SIZE, NULL);
 	ASSERT_GE(self->kvm_run_size, sizeof(struct kvm_run))
 		  TH_LOG(KVM_IOCTL_ERROR(KVM_GET_VCPU_MMAP_SIZE, self->kvm_run_size));
-	self->run = (struct kvm_run *)mmap(NULL, self->kvm_run_size,
-		    PROT_READ | PROT_WRITE, MAP_SHARED, self->vcpu_fd, 0);
-	ASSERT_NE(self->run, MAP_FAILED);
+	self->run = kvm_mmap(self->kvm_run_size, PROT_READ | PROT_WRITE,
+			     MAP_SHARED, self->vcpu_fd);
 	/**
 	 * For virtual cpus that have been created with S390 user controlled
 	 * virtual machines, the resulting vcpu fd can be memory mapped at page
 	 * offset KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of
 	 * the virtual cpu's hardware control block.
 	 */
-	self->sie_block = (struct kvm_s390_sie_block *)mmap(NULL, PAGE_SIZE,
-			  PROT_READ | PROT_WRITE, MAP_SHARED,
-			  self->vcpu_fd, KVM_S390_SIE_PAGE_OFFSET << PAGE_SHIFT);
-	ASSERT_NE(self->sie_block, MAP_FAILED);
+	self->sie_block = __kvm_mmap(PAGE_SIZE, PROT_READ | PROT_WRITE,
+				     MAP_SHARED, self->vcpu_fd,
+				     KVM_S390_SIE_PAGE_OFFSET << PAGE_SHIFT);
 
 	TH_LOG("VM created %p %p", self->run, self->sie_block);
 
@@ -186,8 +184,8 @@ FIXTURE_SETUP(uc_kvm)
 
 FIXTURE_TEARDOWN(uc_kvm)
 {
-	munmap(self->sie_block, PAGE_SIZE);
-	munmap(self->run, self->kvm_run_size);
+	kvm_munmap(self->sie_block, PAGE_SIZE);
+	kvm_munmap(self->run, self->kvm_run_size);
 	close(self->vcpu_fd);
 	close(self->vm_fd);
 	close(self->kvm_fd);
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index ce3ac0fd6dfb..7fe427ff9b38 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -433,10 +433,10 @@ static void test_add_max_memory_regions(void)
 	pr_info("Adding slots 0..%i, each memory region with %dK size\n",
 		(max_mem_slots - 1), MEM_REGION_SIZE >> 10);
 
-	mem = mmap(NULL, (size_t)max_mem_slots * MEM_REGION_SIZE + alignment,
-		   PROT_READ | PROT_WRITE,
-		   MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
-	TEST_ASSERT(mem != MAP_FAILED, "Failed to mmap() host");
+
+	mem = kvm_mmap((size_t)max_mem_slots * MEM_REGION_SIZE + alignment,
+		       PROT_READ | PROT_WRITE,
+		       MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1);
 	mem_aligned = (void *)(((size_t) mem + alignment - 1) & ~(alignment - 1));
 
 	for (slot = 0; slot < max_mem_slots; slot++)
@@ -446,9 +446,8 @@ static void test_add_max_memory_regions(void)
 					  mem_aligned + (uint64_t)slot * MEM_REGION_SIZE);
 
 	/* Check it cannot be added memory slots beyond the limit */
-	mem_extra = mmap(NULL, MEM_REGION_SIZE, PROT_READ | PROT_WRITE,
-			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-	TEST_ASSERT(mem_extra != MAP_FAILED, "Failed to mmap() host");
+	mem_extra = kvm_mmap(MEM_REGION_SIZE, PROT_READ | PROT_WRITE,
+			     MAP_PRIVATE | MAP_ANONYMOUS, -1);
 
 	ret = __vm_set_user_memory_region(vm, max_mem_slots, 0,
 					  (uint64_t)max_mem_slots * MEM_REGION_SIZE,
@@ -456,8 +455,8 @@ static void test_add_max_memory_regions(void)
 	TEST_ASSERT(ret == -1 && errno == EINVAL,
 		    "Adding one more memory slot should fail with EINVAL");
 
-	munmap(mem, (size_t)max_mem_slots * MEM_REGION_SIZE + alignment);
-	munmap(mem_extra, MEM_REGION_SIZE);
+	kvm_munmap(mem, (size_t)max_mem_slots * MEM_REGION_SIZE + alignment);
+	kvm_munmap(mem_extra, MEM_REGION_SIZE);
 	kvm_vm_free(vm);
 }
 
-- 
2.51.0.536.g15c5d4f767-goog


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails
  2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
                   ` (4 preceding siblings ...)
  2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
@ 2025-09-26 16:31 ` Sean Christopherson
  2025-09-29  9:24   ` Fuad Tabba
                     ` (2 more replies)
  5 siblings, 3 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-26 16:31 UTC (permalink / raw)
  To: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson, Ackerley Tng

Add a guest_memfd testcase to verify that faulting in private memory gets
a SIGBUS.  For now, test only the case where memory is private by default
since KVM doesn't yet support in-place conversion.

Cc: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 62 ++++++++++++++-----
 1 file changed, 46 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 5dd40b77dc07..b5a631aca933 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -40,17 +40,26 @@ static void test_file_read_write(int fd, size_t total_size)
 		    "pwrite on a guest_mem fd should fail");
 }
 
-static void test_mmap_supported(int fd, size_t total_size)
+static void *test_mmap_common(int fd, size_t size)
 {
-	const char val = 0xaa;
-	char *mem;
-	size_t i;
-	int ret;
+	void *mem;
 
-	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+	mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
 	TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
 
-	mem = kvm_mmap(total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
+	mem = kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
+
+	return mem;
+}
+
+static void test_mmap_supported(int fd, size_t total_size)
+{
+	const char val = 0xaa;
+	char *mem;
+	size_t i;
+	int ret;
+
+	mem = test_mmap_common(fd, total_size);
 
 	memset(mem, val, total_size);
 	for (i = 0; i < total_size; i++)
@@ -78,31 +87,47 @@ void fault_sigbus_handler(int signum)
 	siglongjmp(jmpbuf, 1);
 }
 
-static void test_fault_overflow(int fd, size_t total_size)
+static void *test_fault_sigbus(int fd, size_t size)
 {
 	struct sigaction sa_old, sa_new = {
 		.sa_handler = fault_sigbus_handler,
 	};
-	size_t map_size = total_size * 4;
-	const char val = 0xaa;
-	char *mem;
-	size_t i;
+	void *mem;
 
-	mem = kvm_mmap(map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
+	mem = test_mmap_common(fd, size);
 
 	sigaction(SIGBUS, &sa_new, &sa_old);
 	if (sigsetjmp(jmpbuf, 1) == 0) {
-		memset(mem, 0xaa, map_size);
+		memset(mem, 0xaa, size);
 		TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
 	}
 	sigaction(SIGBUS, &sa_old, NULL);
 
+	return mem;
+}
+
+static void test_fault_overflow(int fd, size_t total_size)
+{
+	size_t map_size = total_size * 4;
+	const char val = 0xaa;
+	char *mem;
+	size_t i;
+
+	mem = test_fault_sigbus(fd, map_size);
+
 	for (i = 0; i < total_size; i++)
 		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
 
 	kvm_munmap(mem, map_size);
 }
 
+static void test_fault_private(int fd, size_t total_size)
+{
+	void *mem = test_fault_sigbus(fd, total_size);
+
+	kvm_munmap(mem, total_size);
+}
+
 static void test_mmap_not_supported(int fd, size_t total_size)
 {
 	char *mem;
@@ -274,9 +299,12 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
 
 	gmem_test(file_read_write, vm, flags);
 
-	if (flags & GUEST_MEMFD_FLAG_MMAP) {
+	if (flags & GUEST_MEMFD_FLAG_MMAP &&
+	    flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
 		gmem_test(mmap_supported, vm, flags);
 		gmem_test(fault_overflow, vm, flags);
+	} else if (flags & GUEST_MEMFD_FLAG_MMAP) {
+		gmem_test(fault_private, vm, flags);
 	} else {
 		gmem_test(mmap_not_supported, vm, flags);
 	}
@@ -294,9 +322,11 @@ static void test_guest_memfd(unsigned long vm_type)
 
 	__test_guest_memfd(vm, 0);
 
-	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
+	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP)) {
+		__test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP);
 		__test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP |
 				       GUEST_MEMFD_FLAG_DEFAULT_SHARED);
+	}
 
 	kvm_vm_free(vm);
 }
-- 
2.51.0.536.g15c5d4f767-goog


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
@ 2025-09-29  8:38   ` David Hildenbrand
  2025-09-29  8:57     ` Fuad Tabba
  2025-09-29  9:04   ` Fuad Tabba
  1 sibling, 1 reply; 55+ messages in thread
From: David Hildenbrand @ 2025-09-29  8:38 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-kernel, Fuad Tabba, Ackerley Tng

On 26.09.25 18:31, Sean Christopherson wrote:
> Add a guest_memfd flag to allow userspace to state that the underlying
> memory should be configured to be shared by default, and reject user page
> faults if the guest_memfd instance's memory isn't shared by default.
> Because KVM doesn't yet support in-place private<=>shared conversions, all
> guest_memfd memory effectively follows the default state.

I recall we discussed exactly that in the past (e.g., on April 17) in the call:

"Current plan:
  * guest_memfd creation flag to specify “all memory starts as shared”
    * Compatible with the old behavior where all memory started as private
    * Initially, only these can be mmap (no in-place conversion)
"

> 
> Alternatively, KVM could deduce the default state based on MMAP, which for
> all intents and purposes is what KVM currently does.  However, implicitly
> deriving the default state based on MMAP will result in a messy ABI when
> support for in-place conversions is added.

I don't recall the details, but I faintly remember that we discussed later that with
mmap support, the default will be shared for now, and that no other flag would be
required for the time being.

We could always add a "DEFAULT_PRIVATE" flag when we realize that we would have
to change the default later.

Ackerley might remember more details.

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-29  8:38   ` David Hildenbrand
@ 2025-09-29  8:57     ` Fuad Tabba
  2025-09-29  9:01       ` David Hildenbrand
  0 siblings, 1 reply; 55+ messages in thread
From: Fuad Tabba @ 2025-09-29  8:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, kvm, linux-kernel, Ackerley Tng

Hi David.

On Mon, 29 Sept 2025 at 09:38, David Hildenbrand <david@redhat.com> wrote:
>
> On 26.09.25 18:31, Sean Christopherson wrote:
> > Add a guest_memfd flag to allow userspace to state that the underlying
> > memory should be configured to be shared by default, and reject user page
> > faults if the guest_memfd instance's memory isn't shared by default.
> > Because KVM doesn't yet support in-place private<=>shared conversions, all
> > guest_memfd memory effectively follows the default state.
>
> I recall we discussed exactly that in the past (e.g., on April 17) in the call:
>
> "Current plan:
>   * guest_memfd creation flag to specify “all memory starts as shared”
>     * Compatible with the old behavior where all memory started as private
>     * Initially, only these can be mmap (no in-place conversion)
> "
>
> >
> > Alternatively, KVM could deduce the default state based on MMAP, which for
> > all intents and purposes is what KVM currently does.  However, implicitly
> > deriving the default state based on MMAP will result in a messy ABI when
> > support for in-place conversions is added.
>
> I don't recall the details, but I faintly remember that we discussed later that with
> mmap support, the default will be shared for now, and that no other flag would be
> required for the time being.
>
> We could always add a "DEFAULT_PRIVATE" flag when we realize that we would have
> to change the default later.

I remember discussing this. For many confidential computing usecases,
e.g., pKVM and TDX, it would make more sense for the default case to
be private, since it's the more common state, and the initial state.
It also makes sense since sharing is usually triggered by the guest.
Ensuring that the initial state is private reduces the changes of the
VMM forgetting to convert the memory to being private later on,
potentially exposing all guest memory from the get go.

I think it makes sense to clarify things now. Especially since with
memory attributes, the default attribute is
KVM_MEMORY_ATTRIBUTE_SHARED, which adds even more confusion.

Cheers,
/fuad



>
> Ackerley might remember more details.
>
> --
> Cheers
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-29  8:57     ` Fuad Tabba
@ 2025-09-29  9:01       ` David Hildenbrand
  0 siblings, 0 replies; 55+ messages in thread
From: David Hildenbrand @ 2025-09-29  9:01 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, kvm, linux-kernel, Ackerley Tng

On 29.09.25 10:57, Fuad Tabba wrote:
> Hi David.
> 
> On Mon, 29 Sept 2025 at 09:38, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 26.09.25 18:31, Sean Christopherson wrote:
>>> Add a guest_memfd flag to allow userspace to state that the underlying
>>> memory should be configured to be shared by default, and reject user page
>>> faults if the guest_memfd instance's memory isn't shared by default.
>>> Because KVM doesn't yet support in-place private<=>shared conversions, all
>>> guest_memfd memory effectively follows the default state.
>>
>> I recall we discussed exactly that in the past (e.g., on April 17) in the call:
>>
>> "Current plan:
>>    * guest_memfd creation flag to specify “all memory starts as shared”
>>      * Compatible with the old behavior where all memory started as private
>>      * Initially, only these can be mmap (no in-place conversion)
>> "
>>
>>>
>>> Alternatively, KVM could deduce the default state based on MMAP, which for
>>> all intents and purposes is what KVM currently does.  However, implicitly
>>> deriving the default state based on MMAP will result in a messy ABI when
>>> support for in-place conversions is added.
>>
>> I don't recall the details, but I faintly remember that we discussed later that with
>> mmap support, the default will be shared for now, and that no other flag would be
>> required for the time being.
>>
>> We could always add a "DEFAULT_PRIVATE" flag when we realize that we would have
>> to change the default later.
> 
> I remember discussing this. For many confidential computing usecases,
> e.g., pKVM and TDX, it would make more sense for the default case to
> be private, since it's the more common state, and the initial state.
> It also makes sense since sharing is usually triggered by the guest.
> Ensuring that the initial state is private reduces the changes of the
> VMM forgetting to convert the memory to being private later on,
> potentially exposing all guest memory from the get go.
> 
> I think it makes sense to clarify things now. Especially since with
> memory attributes, the default attribute is
> KVM_MEMORY_ATTRIBUTE_SHARED, which adds even more confusion.

Makes sense to me then, thanks.

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
  2025-09-29  8:38   ` David Hildenbrand
@ 2025-09-29  9:04   ` Fuad Tabba
  2025-09-29  9:43     ` Ackerley Tng
  1 sibling, 1 reply; 55+ messages in thread
From: Fuad Tabba @ 2025-09-29  9:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Ackerley Tng

Hi Sean,

On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
>
> Add a guest_memfd flag to allow userspace to state that the underlying
> memory should be configured to be shared by default, and reject user page
> faults if the guest_memfd instance's memory isn't shared by default.
> Because KVM doesn't yet support in-place private<=>shared conversions, all
> guest_memfd memory effectively follows the default state.
>
> Alternatively, KVM could deduce the default state based on MMAP, which for
> all intents and purposes is what KVM currently does.  However, implicitly
> deriving the default state based on MMAP will result in a messy ABI when
> support for in-place conversions is added.
>
> For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
> by default (otherwise the memory would be unusable).  If MMAP implies
> memory is shared by default, then the default state for CoCo VMs will vary
> based on MMAP, and from userspace's perspective, will change when in-place
> conversion support is added.  I.e. to maintain guest<=>host ABI, userspace
> would need to immediately convert all memory from shared=>private, which
> is both ugly and inefficient.  The inefficiency could be avoided by adding
> a flag to state that memory is _private_ by default, irrespective of MMAP,
> but that would lead to an equally messy and hard to document ABI.
>
> Bite the bullet and immediately add a flag to control the default state so
> that the effective behavior is explicit and straightforward.
>
> Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Fuad Tabba <tabba@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  Documentation/virt/kvm/api.rst                 | 10 ++++++++--
>  include/uapi/linux/kvm.h                       |  3 ++-
>  tools/testing/selftests/kvm/guest_memfd_test.c |  5 +++--
>  virt/kvm/guest_memfd.c                         |  6 +++++-
>  4 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index c17a87a0a5ac..4dfe156bbe3c 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6415,8 +6415,14 @@ guest_memfd range is not allowed (any number of memory regions can be bound to
>  a single guest_memfd file, but the bound ranges must not overlap).
>
>  When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
> -supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
> -enables mmap() and faulting of guest_memfd memory to host userspace.
> +supports GUEST_MEMFD_FLAG_MMAP and  GUEST_MEMFD_FLAG_DEFAULT_SHARED.  Setting

There's an extra space between `and` and `GUEST_MEMFD_FLAG_DEFAULT_SHARED`.

> +the MMAP flag on guest_memfd creation enables mmap() and faulting of guest_memfd
> +memory to host userspace (so long as the memory is currently shared).  Setting
> +DEFAULT_SHARED makes all guest_memfd memory shared by default (versus private
> +by default).  Note!  Because KVM doesn't yet support in-place private<=>shared
> +conversions, DEFAULT_SHARED must be specified in order to fault memory into
> +userspace page tables.  This limitation will go away when in-place conversions
> +are supported.

I think that a more accurate (and future proof) description of the
mmap flag could be something along the lines of:

+ Setting GUEST_MEMFD_FLAG_MMAP enables using mmap() on the file descriptor.

+ Setting GUEST_MEMFD_FLAG_DEFAULT_SHARED makes all memory in the file shared
+ by default, as opposed to private. Shared memory can be faulted into host
+ userspace page tables. Private memory cannot.

>  When the KVM MMU performs a PFN lookup to service a guest fault and the backing
>  guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6efa98a57ec1..38a2c083b6aa 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>
>  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
> -#define GUEST_MEMFD_FLAG_MMAP  (1ULL << 0)
> +#define GUEST_MEMFD_FLAG_MMAP          (1ULL << 0)
> +#define GUEST_MEMFD_FLAG_DEFAULT_SHARED        (1ULL << 1)
>
>  struct kvm_create_guest_memfd {
>         __u64 size;
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index b3ca6737f304..81b11a958c7a 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -274,7 +274,7 @@ static void test_guest_memfd(unsigned long vm_type)
>         vm = vm_create_barebones_type(vm_type);
>
>         if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
> -               flags |= GUEST_MEMFD_FLAG_MMAP;
> +               flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>
>         test_create_guest_memfd_multiple(vm);
>         test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
> @@ -337,7 +337,8 @@ static void test_guest_memfd_guest(void)
>                     "Default VM type should always support guest_memfd mmap()");
>
>         size = vm->page_size;
> -       fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP);
> +       fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP |
> +                                            GUEST_MEMFD_FLAG_DEFAULT_SHARED);
>         vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
>
>         mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 08a6bc7d25b6..19f05a45be04 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -328,6 +328,9 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>         if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>                 return VM_FAULT_SIGBUS;
>
> +       if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_DEFAULT_SHARED))
> +               return VM_FAULT_SIGBUS;
> +
>         folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>         if (IS_ERR(folio)) {
>                 int err = PTR_ERR(folio);
> @@ -525,7 +528,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>         u64 valid_flags = 0;
>
>         if (kvm_arch_supports_gmem_mmap(kvm))
> -               valid_flags |= GUEST_MEMFD_FLAG_MMAP;
> +               valid_flags |= GUEST_MEMFD_FLAG_MMAP |
> +                              GUEST_MEMFD_FLAG_DEFAULT_SHARED;

At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
checking for that, at least until we have in-place conversion? Having
only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
isn't a useful combination.

That said, these are all nits, I'll leave it to you. With that:

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad



>
>         if (flags & ~valid_flags)
>                 return -EINVAL;
> --
> 2.51.0.536.g15c5d4f767-goog
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test
  2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
@ 2025-09-29  9:12   ` Fuad Tabba
  2025-09-29  9:17   ` David Hildenbrand
  2025-09-29 10:56   ` Ackerley Tng
  2 siblings, 0 replies; 55+ messages in thread
From: Fuad Tabba @ 2025-09-29  9:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Ackerley Tng

On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
>
> Use a global variable to track the host page size in the guest_memfd test
> so that the information doesn't need to be constantly passed around.  The
> state is purely a reflection of the underlying system, i.e. can't be set
> by the test and is constant for a given invocation of the test, and thus
> explicitly passing the host page size to individual testcases adds no
> value, e.g. doesn't allow testing different combinations.
>
> Making page_size a global will simplify an upcoming change to create a new
> guest_memfd instance per testcase.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 37 +++++++++----------
>  1 file changed, 18 insertions(+), 19 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 81b11a958c7a..8251d019206a 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -24,6 +24,8 @@
>  #include "test_util.h"
>  #include "ucall_common.h"
>
> +static size_t page_size;
> +
>  static void test_file_read_write(int fd)
>  {
>         char buf[64];
> @@ -38,7 +40,7 @@ static void test_file_read_write(int fd)
>                     "pwrite on a guest_mem fd should fail");
>  }
>
> -static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
> +static void test_mmap_supported(int fd, size_t total_size)
>  {
>         const char val = 0xaa;
>         char *mem;
> @@ -78,7 +80,7 @@ void fault_sigbus_handler(int signum)
>         siglongjmp(jmpbuf, 1);
>  }
>
> -static void test_fault_overflow(int fd, size_t page_size, size_t total_size)
> +static void test_fault_overflow(int fd, size_t total_size)
>  {
>         struct sigaction sa_old, sa_new = {
>                 .sa_handler = fault_sigbus_handler,
> @@ -106,7 +108,7 @@ static void test_fault_overflow(int fd, size_t page_size, size_t total_size)
>         TEST_ASSERT(!ret, "munmap() should succeed.");
>  }
>
> -static void test_mmap_not_supported(int fd, size_t page_size, size_t total_size)
> +static void test_mmap_not_supported(int fd, size_t total_size)
>  {
>         char *mem;
>
> @@ -117,7 +119,7 @@ static void test_mmap_not_supported(int fd, size_t page_size, size_t total_size)
>         TEST_ASSERT_EQ(mem, MAP_FAILED);
>  }
>
> -static void test_file_size(int fd, size_t page_size, size_t total_size)
> +static void test_file_size(int fd, size_t total_size)
>  {
>         struct stat sb;
>         int ret;
> @@ -128,7 +130,7 @@ static void test_file_size(int fd, size_t page_size, size_t total_size)
>         TEST_ASSERT_EQ(sb.st_blksize, page_size);
>  }
>
> -static void test_fallocate(int fd, size_t page_size, size_t total_size)
> +static void test_fallocate(int fd, size_t total_size)
>  {
>         int ret;
>
> @@ -165,7 +167,7 @@ static void test_fallocate(int fd, size_t page_size, size_t total_size)
>         TEST_ASSERT(!ret, "fallocate to restore punched hole should succeed");
>  }
>
> -static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
> +static void test_invalid_punch_hole(int fd, size_t total_size)
>  {
>         struct {
>                 off_t offset;
> @@ -196,8 +198,7 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
>  }
>
>  static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
> -                                                 uint64_t guest_memfd_flags,
> -                                                 size_t page_size)
> +                                                 uint64_t guest_memfd_flags)
>  {
>         size_t size;
>         int fd;
> @@ -214,7 +215,6 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
>  {
>         int fd1, fd2, ret;
>         struct stat st1, st2;
> -       size_t page_size = getpagesize();
>
>         fd1 = __vm_create_guest_memfd(vm, page_size, 0);
>         TEST_ASSERT(fd1 != -1, "memfd creation should succeed");
> @@ -241,7 +241,6 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
>
>  static void test_guest_memfd_flags(struct kvm_vm *vm, uint64_t valid_flags)
>  {
> -       size_t page_size = getpagesize();
>         uint64_t flag;
>         int fd;
>
> @@ -265,10 +264,8 @@ static void test_guest_memfd(unsigned long vm_type)
>         uint64_t flags = 0;
>         struct kvm_vm *vm;
>         size_t total_size;
> -       size_t page_size;
>         int fd;
>
> -       page_size = getpagesize();
>         total_size = page_size * 4;
>
>         vm = vm_create_barebones_type(vm_type);
> @@ -277,22 +274,22 @@ static void test_guest_memfd(unsigned long vm_type)
>                 flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>
>         test_create_guest_memfd_multiple(vm);
> -       test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
> +       test_create_guest_memfd_invalid_sizes(vm, flags);
>
>         fd = vm_create_guest_memfd(vm, total_size, flags);
>
>         test_file_read_write(fd);
>
>         if (flags & GUEST_MEMFD_FLAG_MMAP) {
> -               test_mmap_supported(fd, page_size, total_size);
> -               test_fault_overflow(fd, page_size, total_size);
> +               test_mmap_supported(fd, total_size);
> +               test_fault_overflow(fd, total_size);
>         } else {
> -               test_mmap_not_supported(fd, page_size, total_size);
> +               test_mmap_not_supported(fd, total_size);
>         }
>
> -       test_file_size(fd, page_size, total_size);
> -       test_fallocate(fd, page_size, total_size);
> -       test_invalid_punch_hole(fd, page_size, total_size);
> +       test_file_size(fd, total_size);
> +       test_fallocate(fd, total_size);
> +       test_invalid_punch_hole(fd, total_size);
>
>         test_guest_memfd_flags(vm, flags);
>
> @@ -367,6 +364,8 @@ int main(int argc, char *argv[])
>
>         TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
>
> +       page_size = getpagesize();
> +
>         /*
>          * Not all architectures support KVM_CAP_VM_TYPES. However, those that
>          * support guest_memfd have that support for the default VM type.
> --
> 2.51.0.536.g15c5d4f767-goog
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test
  2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
  2025-09-29  9:12   ` Fuad Tabba
@ 2025-09-29  9:17   ` David Hildenbrand
  2025-09-29 10:56   ` Ackerley Tng
  2 siblings, 0 replies; 55+ messages in thread
From: David Hildenbrand @ 2025-09-29  9:17 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-kernel, Fuad Tabba, Ackerley Tng

On 26.09.25 18:31, Sean Christopherson wrote:
> Use a global variable to track the host page size in the guest_memfd test
> so that the information doesn't need to be constantly passed around.  The
> state is purely a reflection of the underlying system, i.e. can't be set
> by the test and is constant for a given invocation of the test, and thus
> explicitly passing the host page size to individual testcases adds no
> value, e.g. doesn't allow testing different combinations.
> 
> Making page_size a global will simplify an upcoming change to create a new
> guest_memfd instance per testcase.
> 
> No functional change intended.
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase
  2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
@ 2025-09-29  9:18   ` David Hildenbrand
  2025-09-29  9:24   ` Fuad Tabba
  2025-09-29 11:02   ` Ackerley Tng
  2 siblings, 0 replies; 55+ messages in thread
From: David Hildenbrand @ 2025-09-29  9:18 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-kernel, Fuad Tabba, Ackerley Tng

On 26.09.25 18:31, Sean Christopherson wrote:
> Refactor the guest_memfd selftest to improve test isolation by creating a
> a new guest_memfd for each testcase.  Currently, the test reuses a single
> guest_memfd instance for all testcases, and thus creates dependencies
> between tests, e.g. not truncating folios from the guest_memfd instance
> at the end of a test could lead to unexpected results (see the PUNCH_HOLE
> purging that needs to done by in-flight the NUMA testcases[1]).
> 
> Invoke each test via a macro wrapper to create and close a guest_memfd
> to cut down on the boilerplate copy+paste needed to create a test.
> 
> Link: https://lore.kernel.org/all/20250827175247.83322-10-shivankg@amd.com
> Reported-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP
  2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
@ 2025-09-29  9:21   ` David Hildenbrand
  2025-09-29  9:24   ` Fuad Tabba
  1 sibling, 0 replies; 55+ messages in thread
From: David Hildenbrand @ 2025-09-29  9:21 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-kernel, Fuad Tabba, Ackerley Tng

On 26.09.25 18:31, Sean Christopherson wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> If a VM type supports KVM_CAP_GUEST_MEMFD_MMAP, the guest_memfd test will
> run all test cases with GUEST_MEMFD_FLAG_MMAP set.  This leaves the code
> path for creating a non-mmap()-able guest_memfd on a VM that supports
> mappable guest memfds untested.
> 
> Refactor the test to run the main test suite with a given set of flags.
> Then, for VM types that support the mappable capability, invoke the test
> suite twice: once with no flags, and once with GUEST_MEMFD_FLAG_MMAP
> set.
> 
> This ensures both creation paths are properly exercised on capable VMs.
> 
> test_guest_memfd_flags() tests valid flags, hence it can be run just once
> per VM type, and valid flag identification can be moved into the test
> function.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> [sean: use double-underscores for the inner helper]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   .../testing/selftests/kvm/guest_memfd_test.c  | 30 ++++++++++++-------
>   1 file changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 60c6dec63490..5a50a28ce1fa 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -239,11 +239,16 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
>   	close(fd1);
>   }
>   
> -static void test_guest_memfd_flags(struct kvm_vm *vm, uint64_t valid_flags)
> +static void test_guest_memfd_flags(struct kvm_vm *vm)
>   {
> +	uint64_t valid_flags = 0;
>   	uint64_t flag;
>   	int fd;
>   
> +	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
> +		valid_flags |= GUEST_MEMFD_FLAG_MMAP |
> +			       GUEST_MEMFD_FLAG_DEFAULT_SHARED;
> +
>   	for (flag = BIT(0); flag; flag <<= 1) {
>   		fd = __vm_create_guest_memfd(vm, page_size, flag);
>   		if (flag & valid_flags) {
> @@ -267,16 +272,8 @@ do {									\
>   	close(fd);							\
>   } while (0)
>   
> -static void test_guest_memfd(unsigned long vm_type)
> +static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
>   {
> -	uint64_t flags = 0;
> -	struct kvm_vm *vm;
> -
> -	vm = vm_create_barebones_type(vm_type);
> -
> -	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
> -		flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
> -
>   	test_create_guest_memfd_multiple(vm);
>   	test_create_guest_memfd_invalid_sizes(vm, flags);
>   
> @@ -292,8 +289,19 @@ static void test_guest_memfd(unsigned long vm_type)
>   	gmem_test(file_size, vm, flags);
>   	gmem_test(fallocate, vm, flags);
>   	gmem_test(invalid_punch_hole, vm, flags);
> +}
>   
> -	test_guest_memfd_flags(vm, flags);
> +static void test_guest_memfd(unsigned long vm_type)
> +{
> +	struct kvm_vm *vm = vm_create_barebones_type(vm_type);
> +
> +	test_guest_memfd_flags(vm);
> +
> +	__test_guest_memfd(vm, 0);

Having a simple test_guest_memfd_noflags() wrapper might make this 
easier to read.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase
  2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
  2025-09-29  9:18   ` David Hildenbrand
@ 2025-09-29  9:24   ` Fuad Tabba
  2025-09-29 11:02   ` Ackerley Tng
  2 siblings, 0 replies; 55+ messages in thread
From: Fuad Tabba @ 2025-09-29  9:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Ackerley Tng

Hi Sean,

On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
>
> Refactor the guest_memfd selftest to improve test isolation by creating a
> a new guest_memfd for each testcase.  Currently, the test reuses a single
> guest_memfd instance for all testcases, and thus creates dependencies
> between tests, e.g. not truncating folios from the guest_memfd instance
> at the end of a test could lead to unexpected results (see the PUNCH_HOLE
> purging that needs to done by in-flight the NUMA testcases[1]).
>
> Invoke each test via a macro wrapper to create and close a guest_memfd
> to cut down on the boilerplate copy+paste needed to create a test.
>
> Link: https://lore.kernel.org/all/20250827175247.83322-10-shivankg@amd.com
> Reported-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 31 ++++++++++---------
>  1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 8251d019206a..60c6dec63490 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -26,7 +26,7 @@
>
>  static size_t page_size;
>
> -static void test_file_read_write(int fd)
> +static void test_file_read_write(int fd, size_t total_size)
>  {
>         char buf[64];
>
> @@ -259,14 +259,18 @@ static void test_guest_memfd_flags(struct kvm_vm *vm, uint64_t valid_flags)
>         }
>  }
>
> +#define gmem_test(__test, __vm, __flags)                               \
> +do {                                                                   \
> +       int fd = vm_create_guest_memfd(__vm, page_size * 4, __flags);   \
> +                                                                       \
> +       test_##__test(fd, page_size * 4);                               \
> +       close(fd);                                                      \
> +} while (0)
> +

Can we have a const for total_size that sets it to `page_size * 4`
instead of hardcoding that value twice?

With that:

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad






>  static void test_guest_memfd(unsigned long vm_type)
>  {
>         uint64_t flags = 0;
>         struct kvm_vm *vm;
> -       size_t total_size;
> -       int fd;
> -
> -       total_size = page_size * 4;
>
>         vm = vm_create_barebones_type(vm_type);
>
> @@ -276,24 +280,21 @@ static void test_guest_memfd(unsigned long vm_type)
>         test_create_guest_memfd_multiple(vm);
>         test_create_guest_memfd_invalid_sizes(vm, flags);
>
> -       fd = vm_create_guest_memfd(vm, total_size, flags);
> -
> -       test_file_read_write(fd);
> +       gmem_test(file_read_write, vm, flags);
>
>         if (flags & GUEST_MEMFD_FLAG_MMAP) {
> -               test_mmap_supported(fd, total_size);
> -               test_fault_overflow(fd, total_size);
> +               gmem_test(mmap_supported, vm, flags);
> +               gmem_test(fault_overflow, vm, flags);
>         } else {
> -               test_mmap_not_supported(fd, total_size);
> +               gmem_test(mmap_not_supported, vm, flags);
>         }
>
> -       test_file_size(fd, total_size);
> -       test_fallocate(fd, total_size);
> -       test_invalid_punch_hole(fd, total_size);
> +       gmem_test(file_size, vm, flags);
> +       gmem_test(fallocate, vm, flags);
> +       gmem_test(invalid_punch_hole, vm, flags);
>
>         test_guest_memfd_flags(vm, flags);
>
> -       close(fd);
>         kvm_vm_free(vm);
>  }
>
> --
> 2.51.0.536.g15c5d4f767-goog
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP
  2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
  2025-09-29  9:21   ` David Hildenbrand
@ 2025-09-29  9:24   ` Fuad Tabba
  1 sibling, 0 replies; 55+ messages in thread
From: Fuad Tabba @ 2025-09-29  9:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Ackerley Tng

On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> If a VM type supports KVM_CAP_GUEST_MEMFD_MMAP, the guest_memfd test will
> run all test cases with GUEST_MEMFD_FLAG_MMAP set.  This leaves the code
> path for creating a non-mmap()-able guest_memfd on a VM that supports
> mappable guest memfds untested.
>
> Refactor the test to run the main test suite with a given set of flags.
> Then, for VM types that support the mappable capability, invoke the test
> suite twice: once with no flags, and once with GUEST_MEMFD_FLAG_MMAP
> set.
>
> This ensures both creation paths are properly exercised on capable VMs.
>
> test_guest_memfd_flags() tests valid flags, hence it can be run just once
> per VM type, and valid flag identification can be moved into the test
> function.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> [sean: use double-underscores for the inner helper]
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad



> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 30 ++++++++++++-------
>  1 file changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 60c6dec63490..5a50a28ce1fa 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -239,11 +239,16 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
>         close(fd1);
>  }
>
> -static void test_guest_memfd_flags(struct kvm_vm *vm, uint64_t valid_flags)
> +static void test_guest_memfd_flags(struct kvm_vm *vm)
>  {
> +       uint64_t valid_flags = 0;
>         uint64_t flag;
>         int fd;
>
> +       if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
> +               valid_flags |= GUEST_MEMFD_FLAG_MMAP |
> +                              GUEST_MEMFD_FLAG_DEFAULT_SHARED;
> +
>         for (flag = BIT(0); flag; flag <<= 1) {
>                 fd = __vm_create_guest_memfd(vm, page_size, flag);
>                 if (flag & valid_flags) {
> @@ -267,16 +272,8 @@ do {                                                                       \
>         close(fd);                                                      \
>  } while (0)
>
> -static void test_guest_memfd(unsigned long vm_type)
> +static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
>  {
> -       uint64_t flags = 0;
> -       struct kvm_vm *vm;
> -
> -       vm = vm_create_barebones_type(vm_type);
> -
> -       if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
> -               flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
> -
>         test_create_guest_memfd_multiple(vm);
>         test_create_guest_memfd_invalid_sizes(vm, flags);
>
> @@ -292,8 +289,19 @@ static void test_guest_memfd(unsigned long vm_type)
>         gmem_test(file_size, vm, flags);
>         gmem_test(fallocate, vm, flags);
>         gmem_test(invalid_punch_hole, vm, flags);
> +}
>
> -       test_guest_memfd_flags(vm, flags);
> +static void test_guest_memfd(unsigned long vm_type)
> +{
> +       struct kvm_vm *vm = vm_create_barebones_type(vm_type);
> +
> +       test_guest_memfd_flags(vm);
> +
> +       __test_guest_memfd(vm, 0);
> +
> +       if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
> +               __test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP |
> +                                      GUEST_MEMFD_FLAG_DEFAULT_SHARED);
>
>         kvm_vm_free(vm);
>  }
> --
> 2.51.0.536.g15c5d4f767-goog
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success
  2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
@ 2025-09-29  9:24   ` Fuad Tabba
  2025-09-29  9:28   ` David Hildenbrand
  2025-09-29 11:08   ` Ackerley Tng
  2 siblings, 0 replies; 55+ messages in thread
From: Fuad Tabba @ 2025-09-29  9:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Ackerley Tng

On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
>
> Add and use wrappers for mmap() and munmap() that assert success to reduce
> a significant amount of boilerplate code, to ensure all tests assert on
> failure, and to provide consistent error messages on failure.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>


Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
(except for the s390/ucontrol_test.c, which I didn't test)

Cheers,
/fuad




> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 21 +++------
>  .../testing/selftests/kvm/include/kvm_util.h  | 25 +++++++++++
>  tools/testing/selftests/kvm/lib/kvm_util.c    | 44 +++++++------------
>  tools/testing/selftests/kvm/mmu_stress_test.c |  5 +--
>  .../selftests/kvm/s390/ucontrol_test.c        | 16 +++----
>  .../selftests/kvm/set_memory_region_test.c    | 17 ++++---
>  6 files changed, 64 insertions(+), 64 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 5a50a28ce1fa..5dd40b77dc07 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -50,8 +50,7 @@ static void test_mmap_supported(int fd, size_t total_size)
>         mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
>         TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
>
> -       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> -       TEST_ASSERT(mem != MAP_FAILED, "mmap() for guest_memfd should succeed.");
> +       mem = kvm_mmap(total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
>
>         memset(mem, val, total_size);
>         for (i = 0; i < total_size; i++)
> @@ -70,8 +69,7 @@ static void test_mmap_supported(int fd, size_t total_size)
>         for (i = 0; i < total_size; i++)
>                 TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
>
> -       ret = munmap(mem, total_size);
> -       TEST_ASSERT(!ret, "munmap() should succeed.");
> +       kvm_munmap(mem, total_size);
>  }
>
>  static sigjmp_buf jmpbuf;
> @@ -89,10 +87,8 @@ static void test_fault_overflow(int fd, size_t total_size)
>         const char val = 0xaa;
>         char *mem;
>         size_t i;
> -       int ret;
>
> -       mem = mmap(NULL, map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> -       TEST_ASSERT(mem != MAP_FAILED, "mmap() for guest_memfd should succeed.");
> +       mem = kvm_mmap(map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
>
>         sigaction(SIGBUS, &sa_new, &sa_old);
>         if (sigsetjmp(jmpbuf, 1) == 0) {
> @@ -104,8 +100,7 @@ static void test_fault_overflow(int fd, size_t total_size)
>         for (i = 0; i < total_size; i++)
>                 TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
>
> -       ret = munmap(mem, map_size);
> -       TEST_ASSERT(!ret, "munmap() should succeed.");
> +       kvm_munmap(mem, map_size);
>  }
>
>  static void test_mmap_not_supported(int fd, size_t total_size)
> @@ -347,10 +342,9 @@ static void test_guest_memfd_guest(void)
>                                              GUEST_MEMFD_FLAG_DEFAULT_SHARED);
>         vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
>
> -       mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> -       TEST_ASSERT(mem != MAP_FAILED, "mmap() on guest_memfd failed");
> +       mem = kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
>         memset(mem, 0xaa, size);
> -       munmap(mem, size);
> +       kvm_munmap(mem, size);
>
>         virt_pg_map(vm, gpa, gpa);
>         vcpu_args_set(vcpu, 2, gpa, size);
> @@ -358,8 +352,7 @@ static void test_guest_memfd_guest(void)
>
>         TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
>
> -       mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> -       TEST_ASSERT(mem != MAP_FAILED, "mmap() on guest_memfd failed");
> +       mem = kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
>         for (i = 0; i < size; i++)
>                 TEST_ASSERT_EQ(mem[i], 0xff);
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 23a506d7eca3..1c68ff0fb3fb 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -278,6 +278,31 @@ static inline bool kvm_has_cap(long cap)
>  #define __KVM_SYSCALL_ERROR(_name, _ret) \
>         "%s failed, rc: %i errno: %i (%s)", (_name), (_ret), errno, strerror(errno)
>
> +static inline void *__kvm_mmap(size_t size, int prot, int flags, int fd,
> +                              off_t offset)
> +{
> +       void *mem;
> +
> +       mem = mmap(NULL, size, prot, flags, fd, offset);
> +       TEST_ASSERT(mem != MAP_FAILED, __KVM_SYSCALL_ERROR("mmap()",
> +                   (int)(unsigned long)MAP_FAILED));
> +
> +       return mem;
> +}
> +
> +static inline void *kvm_mmap(size_t size, int prot, int flags, int fd)
> +{
> +       return __kvm_mmap(size, prot, flags, fd, 0);
> +}
> +
> +static inline void kvm_munmap(void *mem, size_t size)
> +{
> +       int ret;
> +
> +       ret = munmap(mem, size);
> +       TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
> +}
> +
>  /*
>   * Use the "inner", double-underscore macro when reporting errors from within
>   * other macros so that the name of ioctl() and not its literal numeric value
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index c3f5142b0a54..da754b152c11 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -770,13 +770,11 @@ static void vm_vcpu_rm(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
>         int ret;
>
>         if (vcpu->dirty_gfns) {
> -               ret = munmap(vcpu->dirty_gfns, vm->dirty_ring_size);
> -               TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
> +               kvm_munmap(vcpu->dirty_gfns, vm->dirty_ring_size);
>                 vcpu->dirty_gfns = NULL;
>         }
>
> -       ret = munmap(vcpu->run, vcpu_mmap_sz());
> -       TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
> +       kvm_munmap(vcpu->run, vcpu_mmap_sz());
>
>         ret = close(vcpu->fd);
>         TEST_ASSERT(!ret,  __KVM_SYSCALL_ERROR("close()", ret));
> @@ -810,20 +808,16 @@ void kvm_vm_release(struct kvm_vm *vmp)
>  static void __vm_mem_region_delete(struct kvm_vm *vm,
>                                    struct userspace_mem_region *region)
>  {
> -       int ret;
> -
>         rb_erase(&region->gpa_node, &vm->regions.gpa_tree);
>         rb_erase(&region->hva_node, &vm->regions.hva_tree);
>         hash_del(&region->slot_node);
>
>         sparsebit_free(&region->unused_phy_pages);
>         sparsebit_free(&region->protected_phy_pages);
> -       ret = munmap(region->mmap_start, region->mmap_size);
> -       TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
> +       kvm_munmap(region->mmap_start, region->mmap_size);
>         if (region->fd >= 0) {
>                 /* There's an extra map when using shared memory. */
> -               ret = munmap(region->mmap_alias, region->mmap_size);
> -               TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
> +               kvm_munmap(region->mmap_alias, region->mmap_size);
>                 close(region->fd);
>         }
>         if (region->region.guest_memfd >= 0)
> @@ -1080,12 +1074,9 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>                 region->fd = kvm_memfd_alloc(region->mmap_size,
>                                              src_type == VM_MEM_SRC_SHARED_HUGETLB);
>
> -       region->mmap_start = mmap(NULL, region->mmap_size,
> -                                 PROT_READ | PROT_WRITE,
> -                                 vm_mem_backing_src_alias(src_type)->flag,
> -                                 region->fd, 0);
> -       TEST_ASSERT(region->mmap_start != MAP_FAILED,
> -                   __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED));
> +       region->mmap_start = kvm_mmap(region->mmap_size, PROT_READ | PROT_WRITE,
> +                                     vm_mem_backing_src_alias(src_type)->flag,
> +                                     region->fd);
>
>         TEST_ASSERT(!is_backing_src_hugetlb(src_type) ||
>                     region->mmap_start == align_ptr_up(region->mmap_start, backing_src_pagesz),
> @@ -1156,12 +1147,10 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>
>         /* If shared memory, create an alias. */
>         if (region->fd >= 0) {
> -               region->mmap_alias = mmap(NULL, region->mmap_size,
> -                                         PROT_READ | PROT_WRITE,
> -                                         vm_mem_backing_src_alias(src_type)->flag,
> -                                         region->fd, 0);
> -               TEST_ASSERT(region->mmap_alias != MAP_FAILED,
> -                           __KVM_SYSCALL_ERROR("mmap()",  (int)(unsigned long)MAP_FAILED));
> +               region->mmap_alias = kvm_mmap(region->mmap_size,
> +                                             PROT_READ | PROT_WRITE,
> +                                             vm_mem_backing_src_alias(src_type)->flag,
> +                                             region->fd);
>
>                 /* Align host alias address */
>                 region->host_alias = align_ptr_up(region->mmap_alias, alignment);
> @@ -1371,10 +1360,8 @@ struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id)
>         TEST_ASSERT(vcpu_mmap_sz() >= sizeof(*vcpu->run), "vcpu mmap size "
>                 "smaller than expected, vcpu_mmap_sz: %i expected_min: %zi",
>                 vcpu_mmap_sz(), sizeof(*vcpu->run));
> -       vcpu->run = (struct kvm_run *) mmap(NULL, vcpu_mmap_sz(),
> -               PROT_READ | PROT_WRITE, MAP_SHARED, vcpu->fd, 0);
> -       TEST_ASSERT(vcpu->run != MAP_FAILED,
> -                   __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED));
> +       vcpu->run = kvm_mmap(vcpu_mmap_sz(), PROT_READ | PROT_WRITE,
> +                            MAP_SHARED, vcpu->fd);
>
>         if (kvm_has_cap(KVM_CAP_BINARY_STATS_FD))
>                 vcpu->stats.fd = vcpu_get_stats_fd(vcpu);
> @@ -1821,9 +1808,8 @@ void *vcpu_map_dirty_ring(struct kvm_vcpu *vcpu)
>                             page_size * KVM_DIRTY_LOG_PAGE_OFFSET);
>                 TEST_ASSERT(addr == MAP_FAILED, "Dirty ring mapped exec");
>
> -               addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, vcpu->fd,
> -                           page_size * KVM_DIRTY_LOG_PAGE_OFFSET);
> -               TEST_ASSERT(addr != MAP_FAILED, "Dirty ring map failed");
> +               addr = __kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, vcpu->fd,
> +                                 page_size * KVM_DIRTY_LOG_PAGE_OFFSET);
>
>                 vcpu->dirty_gfns = addr;
>                 vcpu->dirty_gfns_count = size / sizeof(struct kvm_dirty_gfn);
> diff --git a/tools/testing/selftests/kvm/mmu_stress_test.c b/tools/testing/selftests/kvm/mmu_stress_test.c
> index 6a437d2be9fa..37b7e6524533 100644
> --- a/tools/testing/selftests/kvm/mmu_stress_test.c
> +++ b/tools/testing/selftests/kvm/mmu_stress_test.c
> @@ -339,8 +339,7 @@ int main(int argc, char *argv[])
>         TEST_ASSERT(max_gpa > (4 * slot_size), "MAXPHYADDR <4gb ");
>
>         fd = kvm_memfd_alloc(slot_size, hugepages);
> -       mem = mmap(NULL, slot_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> -       TEST_ASSERT(mem != MAP_FAILED, "mmap() failed");
> +       mem = kvm_mmap(slot_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
>
>         TEST_ASSERT(!madvise(mem, slot_size, MADV_NOHUGEPAGE), "madvise() failed");
>
> @@ -413,7 +412,7 @@ int main(int argc, char *argv[])
>         for (slot = (slot - 1) & ~1ull; slot >= first_slot; slot -= 2)
>                 vm_set_user_memory_region(vm, slot, 0, 0, 0, NULL);
>
> -       munmap(mem, slot_size / 2);
> +       kvm_munmap(mem, slot_size / 2);
>
>         /* Sanity check that the vCPUs actually ran. */
>         for (i = 0; i < nr_vcpus; i++)
> diff --git a/tools/testing/selftests/kvm/s390/ucontrol_test.c b/tools/testing/selftests/kvm/s390/ucontrol_test.c
> index d265b34c54be..50bc1c38225a 100644
> --- a/tools/testing/selftests/kvm/s390/ucontrol_test.c
> +++ b/tools/testing/selftests/kvm/s390/ucontrol_test.c
> @@ -142,19 +142,17 @@ FIXTURE_SETUP(uc_kvm)
>         self->kvm_run_size = ioctl(self->kvm_fd, KVM_GET_VCPU_MMAP_SIZE, NULL);
>         ASSERT_GE(self->kvm_run_size, sizeof(struct kvm_run))
>                   TH_LOG(KVM_IOCTL_ERROR(KVM_GET_VCPU_MMAP_SIZE, self->kvm_run_size));
> -       self->run = (struct kvm_run *)mmap(NULL, self->kvm_run_size,
> -                   PROT_READ | PROT_WRITE, MAP_SHARED, self->vcpu_fd, 0);
> -       ASSERT_NE(self->run, MAP_FAILED);
> +       self->run = kvm_mmap(self->kvm_run_size, PROT_READ | PROT_WRITE,
> +                            MAP_SHARED, self->vcpu_fd);
>         /**
>          * For virtual cpus that have been created with S390 user controlled
>          * virtual machines, the resulting vcpu fd can be memory mapped at page
>          * offset KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of
>          * the virtual cpu's hardware control block.
>          */
> -       self->sie_block = (struct kvm_s390_sie_block *)mmap(NULL, PAGE_SIZE,
> -                         PROT_READ | PROT_WRITE, MAP_SHARED,
> -                         self->vcpu_fd, KVM_S390_SIE_PAGE_OFFSET << PAGE_SHIFT);
> -       ASSERT_NE(self->sie_block, MAP_FAILED);
> +       self->sie_block = __kvm_mmap(PAGE_SIZE, PROT_READ | PROT_WRITE,
> +                                    MAP_SHARED, self->vcpu_fd,
> +                                    KVM_S390_SIE_PAGE_OFFSET << PAGE_SHIFT);
>
>         TH_LOG("VM created %p %p", self->run, self->sie_block);
>
> @@ -186,8 +184,8 @@ FIXTURE_SETUP(uc_kvm)
>
>  FIXTURE_TEARDOWN(uc_kvm)
>  {
> -       munmap(self->sie_block, PAGE_SIZE);
> -       munmap(self->run, self->kvm_run_size);
> +       kvm_munmap(self->sie_block, PAGE_SIZE);
> +       kvm_munmap(self->run, self->kvm_run_size);
>         close(self->vcpu_fd);
>         close(self->vm_fd);
>         close(self->kvm_fd);
> diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
> index ce3ac0fd6dfb..7fe427ff9b38 100644
> --- a/tools/testing/selftests/kvm/set_memory_region_test.c
> +++ b/tools/testing/selftests/kvm/set_memory_region_test.c
> @@ -433,10 +433,10 @@ static void test_add_max_memory_regions(void)
>         pr_info("Adding slots 0..%i, each memory region with %dK size\n",
>                 (max_mem_slots - 1), MEM_REGION_SIZE >> 10);
>
> -       mem = mmap(NULL, (size_t)max_mem_slots * MEM_REGION_SIZE + alignment,
> -                  PROT_READ | PROT_WRITE,
> -                  MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
> -       TEST_ASSERT(mem != MAP_FAILED, "Failed to mmap() host");
> +
> +       mem = kvm_mmap((size_t)max_mem_slots * MEM_REGION_SIZE + alignment,
> +                      PROT_READ | PROT_WRITE,
> +                      MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1);
>         mem_aligned = (void *)(((size_t) mem + alignment - 1) & ~(alignment - 1));
>
>         for (slot = 0; slot < max_mem_slots; slot++)
> @@ -446,9 +446,8 @@ static void test_add_max_memory_regions(void)
>                                           mem_aligned + (uint64_t)slot * MEM_REGION_SIZE);
>
>         /* Check it cannot be added memory slots beyond the limit */
> -       mem_extra = mmap(NULL, MEM_REGION_SIZE, PROT_READ | PROT_WRITE,
> -                        MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> -       TEST_ASSERT(mem_extra != MAP_FAILED, "Failed to mmap() host");
> +       mem_extra = kvm_mmap(MEM_REGION_SIZE, PROT_READ | PROT_WRITE,
> +                            MAP_PRIVATE | MAP_ANONYMOUS, -1);
>
>         ret = __vm_set_user_memory_region(vm, max_mem_slots, 0,
>                                           (uint64_t)max_mem_slots * MEM_REGION_SIZE,
> @@ -456,8 +455,8 @@ static void test_add_max_memory_regions(void)
>         TEST_ASSERT(ret == -1 && errno == EINVAL,
>                     "Adding one more memory slot should fail with EINVAL");
>
> -       munmap(mem, (size_t)max_mem_slots * MEM_REGION_SIZE + alignment);
> -       munmap(mem_extra, MEM_REGION_SIZE);
> +       kvm_munmap(mem, (size_t)max_mem_slots * MEM_REGION_SIZE + alignment);
> +       kvm_munmap(mem_extra, MEM_REGION_SIZE);
>         kvm_vm_free(vm);
>  }
>
> --
> 2.51.0.536.g15c5d4f767-goog
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails
  2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
@ 2025-09-29  9:24   ` Fuad Tabba
  2025-09-29  9:28   ` David Hildenbrand
  2025-09-29 14:38   ` Ackerley Tng
  2 siblings, 0 replies; 55+ messages in thread
From: Fuad Tabba @ 2025-09-29  9:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Ackerley Tng

On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
>
> Add a guest_memfd testcase to verify that faulting in private memory gets
> a SIGBUS.  For now, test only the case where memory is private by default
> since KVM doesn't yet support in-place conversion.
>
> Cc: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>


Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 62 ++++++++++++++-----
>  1 file changed, 46 insertions(+), 16 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 5dd40b77dc07..b5a631aca933 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -40,17 +40,26 @@ static void test_file_read_write(int fd, size_t total_size)
>                     "pwrite on a guest_mem fd should fail");
>  }
>
> -static void test_mmap_supported(int fd, size_t total_size)
> +static void *test_mmap_common(int fd, size_t size)
>  {
> -       const char val = 0xaa;
> -       char *mem;
> -       size_t i;
> -       int ret;
> +       void *mem;
>
> -       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> +       mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
>         TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
>
> -       mem = kvm_mmap(total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> +       mem = kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> +
> +       return mem;
> +}
> +
> +static void test_mmap_supported(int fd, size_t total_size)
> +{
> +       const char val = 0xaa;
> +       char *mem;
> +       size_t i;
> +       int ret;
> +
> +       mem = test_mmap_common(fd, total_size);
>
>         memset(mem, val, total_size);
>         for (i = 0; i < total_size; i++)
> @@ -78,31 +87,47 @@ void fault_sigbus_handler(int signum)
>         siglongjmp(jmpbuf, 1);
>  }
>
> -static void test_fault_overflow(int fd, size_t total_size)
> +static void *test_fault_sigbus(int fd, size_t size)
>  {
>         struct sigaction sa_old, sa_new = {
>                 .sa_handler = fault_sigbus_handler,
>         };
> -       size_t map_size = total_size * 4;
> -       const char val = 0xaa;
> -       char *mem;
> -       size_t i;
> +       void *mem;
>
> -       mem = kvm_mmap(map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> +       mem = test_mmap_common(fd, size);
>
>         sigaction(SIGBUS, &sa_new, &sa_old);
>         if (sigsetjmp(jmpbuf, 1) == 0) {
> -               memset(mem, 0xaa, map_size);
> +               memset(mem, 0xaa, size);
>                 TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
>         }
>         sigaction(SIGBUS, &sa_old, NULL);
>
> +       return mem;
> +}
> +
> +static void test_fault_overflow(int fd, size_t total_size)
> +{
> +       size_t map_size = total_size * 4;
> +       const char val = 0xaa;
> +       char *mem;
> +       size_t i;
> +
> +       mem = test_fault_sigbus(fd, map_size);
> +
>         for (i = 0; i < total_size; i++)
>                 TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
>
>         kvm_munmap(mem, map_size);
>  }
>
> +static void test_fault_private(int fd, size_t total_size)
> +{
> +       void *mem = test_fault_sigbus(fd, total_size);
> +
> +       kvm_munmap(mem, total_size);
> +}
> +
>  static void test_mmap_not_supported(int fd, size_t total_size)
>  {
>         char *mem;
> @@ -274,9 +299,12 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
>
>         gmem_test(file_read_write, vm, flags);
>
> -       if (flags & GUEST_MEMFD_FLAG_MMAP) {
> +       if (flags & GUEST_MEMFD_FLAG_MMAP &&
> +           flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
>                 gmem_test(mmap_supported, vm, flags);
>                 gmem_test(fault_overflow, vm, flags);
> +       } else if (flags & GUEST_MEMFD_FLAG_MMAP) {
> +               gmem_test(fault_private, vm, flags);
>         } else {
>                 gmem_test(mmap_not_supported, vm, flags);
>         }
> @@ -294,9 +322,11 @@ static void test_guest_memfd(unsigned long vm_type)
>
>         __test_guest_memfd(vm, 0);
>
> -       if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
> +       if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP)) {
> +               __test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP);
>                 __test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP |
>                                        GUEST_MEMFD_FLAG_DEFAULT_SHARED);
> +       }
>
>         kvm_vm_free(vm);
>  }
> --
> 2.51.0.536.g15c5d4f767-goog
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success
  2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
  2025-09-29  9:24   ` Fuad Tabba
@ 2025-09-29  9:28   ` David Hildenbrand
  2025-09-29 11:08   ` Ackerley Tng
  2 siblings, 0 replies; 55+ messages in thread
From: David Hildenbrand @ 2025-09-29  9:28 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-kernel, Fuad Tabba, Ackerley Tng

On 26.09.25 18:31, Sean Christopherson wrote:
> Add and use wrappers for mmap() and munmap() that assert success to reduce
> a significant amount of boilerplate code, to ensure all tests assert on
> failure, and to provide consistent error messages on failure.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails
  2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
  2025-09-29  9:24   ` Fuad Tabba
@ 2025-09-29  9:28   ` David Hildenbrand
  2025-09-29 14:38   ` Ackerley Tng
  2 siblings, 0 replies; 55+ messages in thread
From: David Hildenbrand @ 2025-09-29  9:28 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-kernel, Fuad Tabba, Ackerley Tng

On 26.09.25 18:31, Sean Christopherson wrote:
> Add a guest_memfd testcase to verify that faulting in private memory gets
> a SIGBUS.  For now, test only the case where memory is private by default
> since KVM doesn't yet support in-place conversion.
> 
> Cc: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-29  9:04   ` Fuad Tabba
@ 2025-09-29  9:43     ` Ackerley Tng
  2025-09-29 10:15       ` Patrick Roy
  2025-09-29 16:54       ` Sean Christopherson
  0 siblings, 2 replies; 55+ messages in thread
From: Ackerley Tng @ 2025-09-29  9:43 UTC (permalink / raw)
  To: Fuad Tabba, Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand, roypat,
	Kalyazin, Nikita

Fuad Tabba <tabba@google.com> writes:

> Hi Sean,
>
> On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
>>
>> Add a guest_memfd flag to allow userspace to state that the underlying
>> memory should be configured to be shared by default, and reject user page
>> faults if the guest_memfd instance's memory isn't shared by default.
>> Because KVM doesn't yet support in-place private<=>shared conversions, all
>> guest_memfd memory effectively follows the default state.
>>
>> Alternatively, KVM could deduce the default state based on MMAP, which for
>> all intents and purposes is what KVM currently does.  However, implicitly
>> deriving the default state based on MMAP will result in a messy ABI when
>> support for in-place conversions is added.
>>
>> For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
>> by default (otherwise the memory would be unusable).  If MMAP implies
>> memory is shared by default, then the default state for CoCo VMs will vary
>> based on MMAP, and from userspace's perspective, will change when in-place
>> conversion support is added.  I.e. to maintain guest<=>host ABI, userspace
>> would need to immediately convert all memory from shared=>private, which
>> is both ugly and inefficient.  The inefficiency could be avoided by adding
>> a flag to state that memory is _private_ by default, irrespective of MMAP,
>> but that would lead to an equally messy and hard to document ABI.
>>
>> Bite the bullet and immediately add a flag to control the default state so
>> that the effective behavior is explicit and straightforward.
>>

I like having this flag, but didn't propose this because I thought folks
depending on the default being shared (Patrick/Nikita) might have their
usage broken.

>> Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Fuad Tabba <tabba@google.com>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> ---
>>  Documentation/virt/kvm/api.rst                 | 10 ++++++++--
>>  include/uapi/linux/kvm.h                       |  3 ++-
>>  tools/testing/selftests/kvm/guest_memfd_test.c |  5 +++--
>>  virt/kvm/guest_memfd.c                         |  6 +++++-
>>  4 files changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index c17a87a0a5ac..4dfe156bbe3c 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -6415,8 +6415,14 @@ guest_memfd range is not allowed (any number of memory regions can be bound to
>>  a single guest_memfd file, but the bound ranges must not overlap).
>>
>>  When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
>> -supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
>> -enables mmap() and faulting of guest_memfd memory to host userspace.
>> +supports GUEST_MEMFD_FLAG_MMAP and  GUEST_MEMFD_FLAG_DEFAULT_SHARED.  Setting
>
> There's an extra space between `and` and `GUEST_MEMFD_FLAG_DEFAULT_SHARED`.
>

+1 on this. Also, would you consider putting the concept of "at creation
time" or "at initialization time" into the name of the flag?

"Default" could be interpreted as "whenever a folio is allocated for
this guest_memfd", the memory the folio represents is by default
shared.

What we want to represent is that when the guest_memfd is created,
memory at all indices are initialized as shared.

Looking a bit further, when conversion is supported, if this flag is not
specified, then all the indices are initialized as private, right?

>> +the MMAP flag on guest_memfd creation enables mmap() and faulting of guest_memfd
>> +memory to host userspace (so long as the memory is currently shared).  Setting
>> +DEFAULT_SHARED makes all guest_memfd memory shared by default (versus private
>> +by default).  Note!  Because KVM doesn't yet support in-place private<=>shared
>> +conversions, DEFAULT_SHARED must be specified in order to fault memory into
>> +userspace page tables.  This limitation will go away when in-place conversions
>> +are supported.
>
> I think that a more accurate (and future proof) description of the
> mmap flag could be something along the lines of:
>

+1 on these suggestions, I agree that making the concepts of SHARED vs
MMAP orthogonal from the start is more future proof.

> + Setting GUEST_MEMFD_FLAG_MMAP enables using mmap() on the file descriptor.
>
> + Setting GUEST_MEMFD_FLAG_DEFAULT_SHARED makes all memory in the file shared
> + by default

See above, I'd prefer clarifying this as "at initialization time" or
something similar.

> , as opposed to private. Shared memory can be faulted into host
> + userspace page tables. Private memory cannot.
>
>>  When the KVM MMU performs a PFN lookup to service a guest fault and the backing
>>  guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 6efa98a57ec1..38a2c083b6aa 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
>>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>>
>>  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
>> -#define GUEST_MEMFD_FLAG_MMAP  (1ULL << 0)
>> +#define GUEST_MEMFD_FLAG_MMAP          (1ULL << 0)
>> +#define GUEST_MEMFD_FLAG_DEFAULT_SHARED        (1ULL << 1)
>>
>>  struct kvm_create_guest_memfd {
>>         __u64 size;
>> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
>> index b3ca6737f304..81b11a958c7a 100644
>> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
>> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
>> @@ -274,7 +274,7 @@ static void test_guest_memfd(unsigned long vm_type)
>>         vm = vm_create_barebones_type(vm_type);
>>
>>         if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
>> -               flags |= GUEST_MEMFD_FLAG_MMAP;
>> +               flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>>
>>         test_create_guest_memfd_multiple(vm);
>>         test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
>> @@ -337,7 +337,8 @@ static void test_guest_memfd_guest(void)
>>                     "Default VM type should always support guest_memfd mmap()");
>>
>>         size = vm->page_size;
>> -       fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP);
>> +       fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP |
>> +                                            GUEST_MEMFD_FLAG_DEFAULT_SHARED);
>>         vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
>>
>>         mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 08a6bc7d25b6..19f05a45be04 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -328,6 +328,9 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>>         if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>>                 return VM_FAULT_SIGBUS;
>>
>> +       if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_DEFAULT_SHARED))
>> +               return VM_FAULT_SIGBUS;
>> +
>>         folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>>         if (IS_ERR(folio)) {
>>                 int err = PTR_ERR(folio);
>> @@ -525,7 +528,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>>         u64 valid_flags = 0;
>>
>>         if (kvm_arch_supports_gmem_mmap(kvm))
>> -               valid_flags |= GUEST_MEMFD_FLAG_MMAP;
>> +               valid_flags |= GUEST_MEMFD_FLAG_MMAP |
>> +                              GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>
> At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
> GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
> checking for that, at least until we have in-place conversion? Having
> only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
> isn't a useful combination.
>

I think it's okay to have the two flags be orthogonal from the start.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>

> That said, these are all nits, I'll leave it to you. With that:
>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Tested-by: Fuad Tabba <tabba@google.com>
>
> Cheers,
> /fuad
>
>
>
>>
>>         if (flags & ~valid_flags)
>>                 return -EINVAL;
>> --
>> 2.51.0.536.g15c5d4f767-goog
>>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-29  9:43     ` Ackerley Tng
@ 2025-09-29 10:15       ` Patrick Roy
  2025-09-29 10:22         ` David Hildenbrand
  2025-09-29 16:54       ` Sean Christopherson
  1 sibling, 1 reply; 55+ messages in thread
From: Patrick Roy @ 2025-09-29 10:15 UTC (permalink / raw)
  To: Ackerley Tng, Fuad Tabba, Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Kalyazin, Nikita, shivankg

Hi Ackerley!

On Mon, 2025-09-29 at 10:43 +0100, Ackerley Tng wrote:
> Fuad Tabba <tabba@google.com> writes:
> 
>> Hi Sean,
>>
>> On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
>>>
>>> Add a guest_memfd flag to allow userspace to state that the underlying
>>> memory should be configured to be shared by default, and reject user page
>>> faults if the guest_memfd instance's memory isn't shared by default.
>>> Because KVM doesn't yet support in-place private<=>shared conversions, all
>>> guest_memfd memory effectively follows the default state.
>>>
>>> Alternatively, KVM could deduce the default state based on MMAP, which for
>>> all intents and purposes is what KVM currently does.  However, implicitly
>>> deriving the default state based on MMAP will result in a messy ABI when
>>> support for in-place conversions is added.
>>>
>>> For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
>>> by default (otherwise the memory would be unusable).  If MMAP implies
>>> memory is shared by default, then the default state for CoCo VMs will vary
>>> based on MMAP, and from userspace's perspective, will change when in-place
>>> conversion support is added.  I.e. to maintain guest<=>host ABI, userspace
>>> would need to immediately convert all memory from shared=>private, which
>>> is both ugly and inefficient.  The inefficiency could be avoided by adding
>>> a flag to state that memory is _private_ by default, irrespective of MMAP,
>>> but that would lead to an equally messy and hard to document ABI.
>>>
>>> Bite the bullet and immediately add a flag to control the default state so
>>> that the effective behavior is explicit and straightforward.
>>>
> 
> I like having this flag, but didn't propose this because I thought folks
> depending on the default being shared (Patrick/Nikita) might have their
> usage broken.

We'll just need to pass the new flag in Firecracker, that's not a problem
at all :) We aren't running this anywhere in production yet, so nothing
would break on our end.

>>> Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Fuad Tabba <tabba@google.com>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>> ---
>>>  Documentation/virt/kvm/api.rst                 | 10 ++++++++--
>>>  include/uapi/linux/kvm.h                       |  3 ++-
>>>  tools/testing/selftests/kvm/guest_memfd_test.c |  5 +++--
>>>  virt/kvm/guest_memfd.c                         |  6 +++++-
>>>  4 files changed, 18 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index c17a87a0a5ac..4dfe156bbe3c 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -6415,8 +6415,14 @@ guest_memfd range is not allowed (any number of memory regions can be bound to
>>>  a single guest_memfd file, but the bound ranges must not overlap).
>>>
>>>  When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
>>> -supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
>>> -enables mmap() and faulting of guest_memfd memory to host userspace.
>>> +supports GUEST_MEMFD_FLAG_MMAP and  GUEST_MEMFD_FLAG_DEFAULT_SHARED.  Setting
>>
>> There's an extra space between `and` and `GUEST_MEMFD_FLAG_DEFAULT_SHARED`.
>>
> 
> +1 on this. Also, would you consider putting the concept of "at creation
> time" or "at initialization time" into the name of the flag?
> 
> "Default" could be interpreted as "whenever a folio is allocated for
> this guest_memfd", the memory the folio represents is by default
> shared.
> 
> What we want to represent is that when the guest_memfd is created,
> memory at all indices are initialized as shared.
> 
> Looking a bit further, when conversion is supported, if this flag is not
> specified, then all the indices are initialized as private, right?
> 
>>> +the MMAP flag on guest_memfd creation enables mmap() and faulting of guest_memfd
>>> +memory to host userspace (so long as the memory is currently shared).  Setting
>>> +DEFAULT_SHARED makes all guest_memfd memory shared by default (versus private
>>> +by default).  Note!  Because KVM doesn't yet support in-place private<=>shared
>>> +conversions, DEFAULT_SHARED must be specified in order to fault memory into
>>> +userspace page tables.  This limitation will go away when in-place conversions
>>> +are supported.
>>
>> I think that a more accurate (and future proof) description of the
>> mmap flag could be something along the lines of:
>>
> 
> +1 on these suggestions, I agree that making the concepts of SHARED vs
> MMAP orthogonal from the start is more future proof.
> 
>> + Setting GUEST_MEMFD_FLAG_MMAP enables using mmap() on the file descriptor.
>>
>> + Setting GUEST_MEMFD_FLAG_DEFAULT_SHARED makes all memory in the file shared
>> + by default
> 
> See above, I'd prefer clarifying this as "at initialization time" or
> something similar.
> 
>> , as opposed to private. Shared memory can be faulted into host
>> + userspace page tables. Private memory cannot.
>>
>>>  When the KVM MMU performs a PFN lookup to service a guest fault and the backing
>>>  guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 6efa98a57ec1..38a2c083b6aa 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
>>>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>>>
>>>  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
>>> -#define GUEST_MEMFD_FLAG_MMAP  (1ULL << 0)
>>> +#define GUEST_MEMFD_FLAG_MMAP          (1ULL << 0)
>>> +#define GUEST_MEMFD_FLAG_DEFAULT_SHARED        (1ULL << 1)
>>>
>>>  struct kvm_create_guest_memfd {
>>>         __u64 size;
>>> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
>>> index b3ca6737f304..81b11a958c7a 100644
>>> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
>>> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
>>> @@ -274,7 +274,7 @@ static void test_guest_memfd(unsigned long vm_type)
>>>         vm = vm_create_barebones_type(vm_type);
>>>
>>>         if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
>>> -               flags |= GUEST_MEMFD_FLAG_MMAP;
>>> +               flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>>>
>>>         test_create_guest_memfd_multiple(vm);
>>>         test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
>>> @@ -337,7 +337,8 @@ static void test_guest_memfd_guest(void)
>>>                     "Default VM type should always support guest_memfd mmap()");
>>>
>>>         size = vm->page_size;
>>> -       fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP);
>>> +       fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP |
>>> +                                            GUEST_MEMFD_FLAG_DEFAULT_SHARED);
>>>         vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
>>>
>>>         mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>>> index 08a6bc7d25b6..19f05a45be04 100644
>>> --- a/virt/kvm/guest_memfd.c
>>> +++ b/virt/kvm/guest_memfd.c
>>> @@ -328,6 +328,9 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>>>         if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>>>                 return VM_FAULT_SIGBUS;
>>>
>>> +       if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_DEFAULT_SHARED))
>>> +               return VM_FAULT_SIGBUS;
>>> +
>>>         folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>>>         if (IS_ERR(folio)) {
>>>                 int err = PTR_ERR(folio);
>>> @@ -525,7 +528,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>>>         u64 valid_flags = 0;
>>>
>>>         if (kvm_arch_supports_gmem_mmap(kvm))
>>> -               valid_flags |= GUEST_MEMFD_FLAG_MMAP;
>>> +               valid_flags |= GUEST_MEMFD_FLAG_MMAP |
>>> +                              GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>>
>> At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
>> GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
>> checking for that, at least until we have in-place conversion? Having
>> only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
>> isn't a useful combination.
>>
> 
> I think it's okay to have the two flags be orthogonal from the start.

I think I dimly remember someone at one of the guest_memfd syncs
bringing up a usecase for having a VMA even if all memory is private,
not for faulting anything in, but to do madvise or something? Maybe it
was the NUMA stuff? (+Shivank)

So for that, having the flags be orthogonal would be useful even without
conversion support.

> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
> 
>> That said, these are all nits, I'll leave it to you. With that:
>>
>> Reviewed-by: Fuad Tabba <tabba@google.com>
>> Tested-by: Fuad Tabba <tabba@google.com>
>>
>> Cheers,
>> /fuad
>>
>>
>>
>>>
>>>         if (flags & ~valid_flags)
>>>                 return -EINVAL;
>>> --
>>> 2.51.0.536.g15c5d4f767-goog
>>>

Best,
Patrick

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-29 10:15       ` Patrick Roy
@ 2025-09-29 10:22         ` David Hildenbrand
  2025-09-29 10:51           ` Ackerley Tng
  0 siblings, 1 reply; 55+ messages in thread
From: David Hildenbrand @ 2025-09-29 10:22 UTC (permalink / raw)
  To: Patrick Roy, Ackerley Tng, Fuad Tabba, Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, Kalyazin, Nikita, shivankg

                          GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>>>
>>> At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
>>> GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
>>> checking for that, at least until we have in-place conversion? Having
>>> only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
>>> isn't a useful combination.
>>>
>>
>> I think it's okay to have the two flags be orthogonal from the start.
> 
> I think I dimly remember someone at one of the guest_memfd syncs
> bringing up a usecase for having a VMA even if all memory is private,
> not for faulting anything in, but to do madvise or something? Maybe it
> was the NUMA stuff? (+Shivank)

Yes, that should be it. But we're never faulting in these pages, we only 
need the VMA (for the time being, until there is the in-place conversion).

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-29 10:22         ` David Hildenbrand
@ 2025-09-29 10:51           ` Ackerley Tng
  2025-09-29 16:55             ` Sean Christopherson
  0 siblings, 1 reply; 55+ messages in thread
From: Ackerley Tng @ 2025-09-29 10:51 UTC (permalink / raw)
  To: David Hildenbrand, Patrick Roy, Fuad Tabba, Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, Kalyazin, Nikita, shivankg

David Hildenbrand <david@redhat.com> writes:

>                           GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>>>>
>>>> At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
>>>> GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
>>>> checking for that, at least until we have in-place conversion? Having
>>>> only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
>>>> isn't a useful combination.
>>>>
>>>
>>> I think it's okay to have the two flags be orthogonal from the start.
>> 
>> I think I dimly remember someone at one of the guest_memfd syncs
>> bringing up a usecase for having a VMA even if all memory is private,
>> not for faulting anything in, but to do madvise or something? Maybe it
>> was the NUMA stuff? (+Shivank)
>
> Yes, that should be it. But we're never faulting in these pages, we only 
> need the VMA (for the time being, until there is the in-place conversion).
>

Yup, Sean's patch disables faulting if GUEST_MEMFD_FLAG_DEFAULT_SHARED
is not set, but mmap() is always enabled so madvise() still works.

Requiring GUEST_MEMFD_FLAG_DEFAULT_SHARED to be set together with
GUEST_MEMFD_FLAG_MMAP would still allow madvise() to work since
GUEST_MEMFD_FLAG_DEFAULT_SHARED only gates faulting.

To clarify, I'm still for making GUEST_MEMFD_FLAG_DEFAULT_SHARED
orthogonal to GUEST_MEMFD_FLAG_MMAP with no additional checks on top of
whatever's in this patch. :)

> -- 
> Cheers
>
> David / dhildenb

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test
  2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
  2025-09-29  9:12   ` Fuad Tabba
  2025-09-29  9:17   ` David Hildenbrand
@ 2025-09-29 10:56   ` Ackerley Tng
  2025-09-29 16:58     ` Sean Christopherson
  2 siblings, 1 reply; 55+ messages in thread
From: Ackerley Tng @ 2025-09-29 10:56 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson

Sean Christopherson <seanjc@google.com> writes:

> Use a global variable to track the host page size in the guest_memfd test
> so that the information doesn't need to be constantly passed around.  The
> state is purely a reflection of the underlying system, i.e. can't be set
> by the test and is constant for a given invocation of the test, and thus
> explicitly passing the host page size to individual testcases adds no
> value, e.g. doesn't allow testing different combinations.
>

I was going to pass in page_size to each of these test cases to test
HugeTLB support, that's how page_size crept into the parameters of these
functions.

Could we do a getpagesize() within the gmem_test() macro that you
introduced instead?

Reviewed-by: Ackerley Tng <ackerleytng@google.com>

> Making page_size a global will simplify an upcoming change to create a new
> guest_memfd instance per testcase.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 37 +++++++++----------
>  1 file changed, 18 insertions(+), 19 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 81b11a958c7a..8251d019206a 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -24,6 +24,8 @@
>  #include "test_util.h"
>  #include "ucall_common.h"
>  
> +static size_t page_size;
> +
>  static void test_file_read_write(int fd)
>  {
>  	char buf[64];
> @@ -38,7 +40,7 @@ static void test_file_read_write(int fd)
>  		    "pwrite on a guest_mem fd should fail");
>  }
>  
> -static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
> +static void test_mmap_supported(int fd, size_t total_size)
>  {
>  	const char val = 0xaa;
>  	char *mem;
> @@ -78,7 +80,7 @@ void fault_sigbus_handler(int signum)
>  	siglongjmp(jmpbuf, 1);
>  }
>  
> 
> [...snip...]
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase
  2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
  2025-09-29  9:18   ` David Hildenbrand
  2025-09-29  9:24   ` Fuad Tabba
@ 2025-09-29 11:02   ` Ackerley Tng
  2 siblings, 0 replies; 55+ messages in thread
From: Ackerley Tng @ 2025-09-29 11:02 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson, Lisa Wang

Sean Christopherson <seanjc@google.com> writes:

> Refactor the guest_memfd selftest to improve test isolation by creating a
> a new guest_memfd for each testcase.  Currently, the test reuses a single
> guest_memfd instance for all testcases, and thus creates dependencies
> between tests, e.g. not truncating folios from the guest_memfd instance
> at the end of a test could lead to unexpected results (see the PUNCH_HOLE
> purging that needs to done by in-flight the NUMA testcases[1]).
>

Lisa and I ran into this recently while working on testing
memory_failure() handling for guest_memfd too.

> Invoke each test via a macro wrapper to create and close a guest_memfd
> to cut down on the boilerplate copy+paste needed to create a test.
>

I introduced a wrapper function but a macro is a better idea since it
also parametrizes the test name.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>

> Link: https://lore.kernel.org/all/20250827175247.83322-10-shivankg@amd.com
> Reported-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 31 ++++++++++---------
>  1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 8251d019206a..60c6dec63490 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -26,7 +26,7 @@
>  
>  static size_t page_size;
>  
> -static void test_file_read_write(int fd)
> +static void test_file_read_write(int fd, size_t total_size)
>  {
>  	char buf[64];
>  
> @@ -259,14 +259,18 @@ static void test_guest_memfd_flags(struct kvm_vm *vm, uint64_t valid_flags)
>  	}
>  }
>  
> +#define gmem_test(__test, __vm, __flags)				\
> +do {									\
> +	int fd = vm_create_guest_memfd(__vm, page_size * 4, __flags);	\
> +									\
> +	test_##__test(fd, page_size * 4);				\
> +	close(fd);							\
> +} while (0)
> +
> 
> [...snip...]
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success
  2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
  2025-09-29  9:24   ` Fuad Tabba
  2025-09-29  9:28   ` David Hildenbrand
@ 2025-09-29 11:08   ` Ackerley Tng
  2025-09-29 17:32     ` Sean Christopherson
  2 siblings, 1 reply; 55+ messages in thread
From: Ackerley Tng @ 2025-09-29 11:08 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson

Sean Christopherson <seanjc@google.com> writes:

> Add and use wrappers for mmap() and munmap() that assert success to reduce
> a significant amount of boilerplate code, to ensure all tests assert on
> failure, and to provide consistent error messages on failure.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 21 +++------
>  .../testing/selftests/kvm/include/kvm_util.h  | 25 +++++++++++
>  tools/testing/selftests/kvm/lib/kvm_util.c    | 44 +++++++------------
>  tools/testing/selftests/kvm/mmu_stress_test.c |  5 +--
>  .../selftests/kvm/s390/ucontrol_test.c        | 16 +++----
>  .../selftests/kvm/set_memory_region_test.c    | 17 ++++---
>  6 files changed, 64 insertions(+), 64 deletions(-)
>
> 
> [...snip...]
> 
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 23a506d7eca3..1c68ff0fb3fb 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -278,6 +278,31 @@ static inline bool kvm_has_cap(long cap)
>  #define __KVM_SYSCALL_ERROR(_name, _ret) \
>  	"%s failed, rc: %i errno: %i (%s)", (_name), (_ret), errno, strerror(errno)
>  
> +static inline void *__kvm_mmap(size_t size, int prot, int flags, int fd,
> +			       off_t offset)

Do you have a policy/rationale for putting this in kvm_util.h as opposed
to test_util.h? I like the idea of this wrapper but I thought this is
less of a kvm thing and more of a test utility, and hence it belongs in
test_util.c and test_util.h.

Also, the name kind of associates mmap with KVM too closely IMO, but
test_mmap() is not a great name either.

No strong opinions here.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>

> +{
> +	void *mem;
> +
> +	mem = mmap(NULL, size, prot, flags, fd, offset);
> +	TEST_ASSERT(mem != MAP_FAILED, __KVM_SYSCALL_ERROR("mmap()",
> +		    (int)(unsigned long)MAP_FAILED));
> +
> +	return mem;
> +}
> +
> +static inline void *kvm_mmap(size_t size, int prot, int flags, int fd)
> +{
> +	return __kvm_mmap(size, prot, flags, fd, 0);
> +}
> +
> 
> [...snip...]
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails
  2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
  2025-09-29  9:24   ` Fuad Tabba
  2025-09-29  9:28   ` David Hildenbrand
@ 2025-09-29 14:38   ` Ackerley Tng
  2025-09-29 18:10     ` Sean Christopherson
  2 siblings, 1 reply; 55+ messages in thread
From: Ackerley Tng @ 2025-09-29 14:38 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-kernel, David Hildenbrand, Fuad Tabba,
	Sean Christopherson

Sean Christopherson <seanjc@google.com> writes:

> Add a guest_memfd testcase to verify that faulting in private memory gets
> a SIGBUS.  For now, test only the case where memory is private by default
> since KVM doesn't yet support in-place conversion.
>
> Cc: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 62 ++++++++++++++-----
>  1 file changed, 46 insertions(+), 16 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 5dd40b77dc07..b5a631aca933 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -40,17 +40,26 @@ static void test_file_read_write(int fd, size_t total_size)
>  		    "pwrite on a guest_mem fd should fail");
>  }
>  

I feel that the tests should be grouped by concepts being tested

+ test_cow_not_supported()
    + mmap() should fail
+ test_mmap_supported()
    + kvm_mmap()
    + regular, successful accesses to offsets within the size of the fd
    + kvm_munmap()
+ test_fault_overflow()
    + kvm_mmap()
    + a helper (perhaps "assert_fault_sigbus(char *mem)"?) that purely
      tries to access beyond the size of the fd and catches SIGBUS
    + regular, successful accesses to offsets within the size of the fd
    + kvm_munmap()
+ test_fault_private()
    + kvm_mmap()
    + a helper (perhaps "assert_fault_sigbus(char *mem)"?) that purely
      tries to access within the size of the fd and catches SIGBUS
    + kvm_munmap()

I think some code duplication in tests is okay if it makes the test flow
more obvious.

> -static void test_mmap_supported(int fd, size_t total_size)
> +static void *test_mmap_common(int fd, size_t size)
>  {
> -	const char val = 0xaa;
> -	char *mem;
> -	size_t i;
> -	int ret;
> +	void *mem;
>  
> -	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> +	mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
>  	TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
>

When grouped this way, test_mmap_common() tests that MAP_PRIVATE or COW
is not allowed twice, once in test_mmap_supported() and once in
test_fault_sigbus(). Is that intentional?

> -	mem = kvm_mmap(total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> +	mem = kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> +
> +	return mem;

I feel that returning (and using) the userspace address from a test
(test_mmap_common()) is a little hard to follow.

> +}
> +
> +static void test_mmap_supported(int fd, size_t total_size)
> +{
> +	const char val = 0xaa;
> +	char *mem;
> +	size_t i;
> +	int ret;
> +
> +	mem = test_mmap_common(fd, total_size);
>  
>  	memset(mem, val, total_size);
>  	for (i = 0; i < total_size; i++)
> @@ -78,31 +87,47 @@ void fault_sigbus_handler(int signum)
>  	siglongjmp(jmpbuf, 1);
>  }
>  
> -static void test_fault_overflow(int fd, size_t total_size)
> +static void *test_fault_sigbus(int fd, size_t size)
>  {
>  	struct sigaction sa_old, sa_new = {
>  		.sa_handler = fault_sigbus_handler,
>  	};
> -	size_t map_size = total_size * 4;
> -	const char val = 0xaa;
> -	char *mem;
> -	size_t i;
> +	void *mem;
>  
> -	mem = kvm_mmap(map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> +	mem = test_mmap_common(fd, size);
>  
>  	sigaction(SIGBUS, &sa_new, &sa_old);
>  	if (sigsetjmp(jmpbuf, 1) == 0) {
> -		memset(mem, 0xaa, map_size);
> +		memset(mem, 0xaa, size);
>  		TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
>  	}
>  	sigaction(SIGBUS, &sa_old, NULL);
>  
> +	return mem;

I think returning the userspace address from a test is a little hard to
follow. This one feels even more unexpected because a valid address is
being returned (and used) from a test that has sigbus in its name.

> +}
> +
> +static void test_fault_overflow(int fd, size_t total_size)
> +{
> +	size_t map_size = total_size * 4;
> +	const char val = 0xaa;
> +	char *mem;
> +	size_t i;
> +
> +	mem = test_fault_sigbus(fd, map_size);
> +
>  	for (i = 0; i < total_size; i++)
>  		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
>  
>  	kvm_munmap(mem, map_size);
>  }
>  
> +static void test_fault_private(int fd, size_t total_size)
> +{
> +	void *mem = test_fault_sigbus(fd, total_size);
> +
> +	kvm_munmap(mem, total_size);
> +}
> +

Testing that faults fail when GUEST_MEMFD_FLAG_DEFAULT_SHARED is not set
is a good idea. Perhaps it could be even clearer if further split up:

+ test_mmap_supported()
    + kvm_mmap()
    + kvm_munmap()
+ test_mmap_supported_fault_supported()
    + kvm_mmap()
    + successful accesses to offsets within the size of the fd
    + kvm_munmap()
+ test_mmap_supported_fault_sigbus()
    + kvm_mmap()
    + expect SIGBUS from accesses to offsets within the size of the fd
    + kvm_munmap()

>  static void test_mmap_not_supported(int fd, size_t total_size)
>  {
>  	char *mem;
> @@ -274,9 +299,12 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
>  
>  	gmem_test(file_read_write, vm, flags);
>  
> -	if (flags & GUEST_MEMFD_FLAG_MMAP) {
> +	if (flags & GUEST_MEMFD_FLAG_MMAP &&
> +	    flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
>  		gmem_test(mmap_supported, vm, flags);
>  		gmem_test(fault_overflow, vm, flags);
> +	} else if (flags & GUEST_MEMFD_FLAG_MMAP) {
> +		gmem_test(fault_private, vm, flags);

test_fault_private() makes me think the test is testing for private
faults, but there's nothing private about this fault, and the fault
doesn't even come from the guest.

>  	} else {
>  		gmem_test(mmap_not_supported, vm, flags);
>  	}

If split up as described above, this could be

	if (flags & GUEST_MEMFD_FLAG_MMAP &&
	    flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
		gmem_test(mmap_supported_fault_supported, vm, flags);
		gmem_test(fault_overflow, vm, flags);
	} else if (flags & GUEST_MEMFD_FLAG_MMAP) {
		gmem_test(mmap_supported_fault_sigbus, vm, flags);
	} else {
		gmem_test(mmap_not_supported, vm, flags);
	}

> @@ -294,9 +322,11 @@ static void test_guest_memfd(unsigned long vm_type)
>  
>  	__test_guest_memfd(vm, 0);
>  
> -	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
> +	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP)) {
> +		__test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP);
>  		__test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP |
>  				       GUEST_MEMFD_FLAG_DEFAULT_SHARED);
> +	}
>  
>  	kvm_vm_free(vm);
>  }

I could send a revision, if you agree/prefer!

Reviewed-by: Ackerley Tng <ackerleytng@google.com>

> -- 
> 2.51.0.536.g15c5d4f767-goog

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-29  9:43     ` Ackerley Tng
  2025-09-29 10:15       ` Patrick Roy
@ 2025-09-29 16:54       ` Sean Christopherson
  1 sibling, 0 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-29 16:54 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Fuad Tabba, Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand, roypat,
	Nikita Kalyazin

On Mon, Sep 29, 2025, Ackerley Tng wrote:
> Fuad Tabba <tabba@google.com> writes:
> 
> > Hi Sean,
> >
> > On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
> >>
> >> Add a guest_memfd flag to allow userspace to state that the underlying
> >> memory should be configured to be shared by default, and reject user page
> >> faults if the guest_memfd instance's memory isn't shared by default.
> >> Because KVM doesn't yet support in-place private<=>shared conversions, all
> >> guest_memfd memory effectively follows the default state.
> >>
> >> Alternatively, KVM could deduce the default state based on MMAP, which for
> >> all intents and purposes is what KVM currently does.  However, implicitly
> >> deriving the default state based on MMAP will result in a messy ABI when
> >> support for in-place conversions is added.
> >>
> >> For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
> >> by default (otherwise the memory would be unusable).  If MMAP implies
> >> memory is shared by default, then the default state for CoCo VMs will vary
> >> based on MMAP, and from userspace's perspective, will change when in-place
> >> conversion support is added.  I.e. to maintain guest<=>host ABI, userspace
> >> would need to immediately convert all memory from shared=>private, which
> >> is both ugly and inefficient.  The inefficiency could be avoided by adding
> >> a flag to state that memory is _private_ by default, irrespective of MMAP,
> >> but that would lead to an equally messy and hard to document ABI.
> >>
> >> Bite the bullet and immediately add a flag to control the default state so
> >> that the effective behavior is explicit and straightforward.
> >>
> 
> I like having this flag, but didn't propose this because I thought folks
> depending on the default being shared (Patrick/Nikita) might have their
> usage broken.

mmap() support hasn't landed upstream, so as far as the upstream kernel is
concerned, there is no userspace to break.  Which is exactly why I want to land
this (or something like it) in 6.18, before GUEST_MEMFD_FLAG_MMAP is officially
released.

> >> Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
> >> Cc: David Hildenbrand <david@redhat.com>
> >> Cc: Fuad Tabba <tabba@google.com>
> >> Signed-off-by: Sean Christopherson <seanjc@google.com>
> >> ---
> >>  Documentation/virt/kvm/api.rst                 | 10 ++++++++--
> >>  include/uapi/linux/kvm.h                       |  3 ++-
> >>  tools/testing/selftests/kvm/guest_memfd_test.c |  5 +++--
> >>  virt/kvm/guest_memfd.c                         |  6 +++++-
> >>  4 files changed, 18 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >> index c17a87a0a5ac..4dfe156bbe3c 100644
> >> --- a/Documentation/virt/kvm/api.rst
> >> +++ b/Documentation/virt/kvm/api.rst
> >> @@ -6415,8 +6415,14 @@ guest_memfd range is not allowed (any number of memory regions can be bound to
> >>  a single guest_memfd file, but the bound ranges must not overlap).
> >>
> >>  When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
> >> -supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
> >> -enables mmap() and faulting of guest_memfd memory to host userspace.
> >> +supports GUEST_MEMFD_FLAG_MMAP and  GUEST_MEMFD_FLAG_DEFAULT_SHARED.  Setting
> >
> > There's an extra space between `and` and `GUEST_MEMFD_FLAG_DEFAULT_SHARED`.
> >
> 
> +1 on this. Also, would you consider putting the concept of "at creation
> time" or "at initialization time" into the name of the flag?

Yah, GUEST_MEMFD_FLAG_INIT_SHARED?

> "Default" could be interpreted as "whenever a folio is allocated for
> this guest_memfd", the memory the folio represents is by default
> shared.
> 
> What we want to represent is that when the guest_memfd is created,
> memory at all indices are initialized as shared.
> 
> Looking a bit further, when conversion is supported, if this flag is not
> specified, then all the indices are initialized as private, right?

Correct, which is the current (pre-6.18) behavior.

> >> +the MMAP flag on guest_memfd creation enables mmap() and faulting of guest_memfd
> >> +memory to host userspace (so long as the memory is currently shared).  Setting
> >> +DEFAULT_SHARED makes all guest_memfd memory shared by default (versus private
> >> +by default).  Note!  Because KVM doesn't yet support in-place private<=>shared
> >> +conversions, DEFAULT_SHARED must be specified in order to fault memory into
> >> +userspace page tables.  This limitation will go away when in-place conversions
> >> +are supported.
> >
> > I think that a more accurate (and future proof) description of the
> > mmap flag could be something along the lines of:
> >
> 
> +1 on these suggestions, I agree that making the concepts of SHARED vs
> MMAP orthogonal from the start is more future proof.
> 
> > + Setting GUEST_MEMFD_FLAG_MMAP enables using mmap() on the file descriptor.
> >
> > + Setting GUEST_MEMFD_FLAG_DEFAULT_SHARED makes all memory in the file shared
> > + by default
> 
> See above, I'd prefer clarifying this as "at initialization time" or
> something similar.

Roger that.

> > At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and GUEST_MEMFD_FLAG_MMAP
> > don't make sense without each other. Is it worth checking for that, at
> > least until we have in-place conversion? Having only
> > GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP, isn't a
> > useful combination.

Heh, that's exactly how I coded things up to start:

        /*
         * TODO: Drop the restriction that memory must be shared by default
         *       once in-place conversions are supported.
         */
        if (flags & GUEST_MEMFD_FLAG_MMAP &&
            !(flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED))
                return -EINVAL;

but if we go that route, then dropping the restriction would result in an ABI
change for non-CoCo VMs.  The odds of such an ABI changes breaking userspace are
basically zero, but I couldn't think of any reason to risk it; userspace would
need to specify MMAP+SHARED either way.

And on the flip side, not enforcing the flags at the time of creation allows us
to test that user page faults to private memory are rejected.  It's not a ton of
meaningful coverage, but it's not nothing either.  And from a code perspective,
the diffs when in-place conversions are added are quite nice, as the concepts
don't change (user faults to private memory are disallowed), only the mechanics
change, i.e. the diffs highlight what all needs to happen to support conversions
without the extra noise of a change in overall semantics.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-29 10:51           ` Ackerley Tng
@ 2025-09-29 16:55             ` Sean Christopherson
  2025-09-30  0:15               ` Sean Christopherson
  0 siblings, 1 reply; 55+ messages in thread
From: Sean Christopherson @ 2025-09-29 16:55 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: David Hildenbrand, Patrick Roy, Fuad Tabba, Paolo Bonzini,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
	linux-kernel, Nikita Kalyazin, shivankg

On Mon, Sep 29, 2025, Ackerley Tng wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
> >                           GUEST_MEMFD_FLAG_DEFAULT_SHARED;
> >>>>
> >>>> At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
> >>>> GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
> >>>> checking for that, at least until we have in-place conversion? Having
> >>>> only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
> >>>> isn't a useful combination.
> >>>>
> >>>
> >>> I think it's okay to have the two flags be orthogonal from the start.
> >> 
> >> I think I dimly remember someone at one of the guest_memfd syncs
> >> bringing up a usecase for having a VMA even if all memory is private,
> >> not for faulting anything in, but to do madvise or something? Maybe it
> >> was the NUMA stuff? (+Shivank)
> >
> > Yes, that should be it. But we're never faulting in these pages, we only 
> > need the VMA (for the time being, until there is the in-place conversion).
> >
> 
> Yup, Sean's patch disables faulting if GUEST_MEMFD_FLAG_DEFAULT_SHARED
> is not set, but mmap() is always enabled so madvise() still works.

Hah!  I totally intended that :-D

> Requiring GUEST_MEMFD_FLAG_DEFAULT_SHARED to be set together with
> GUEST_MEMFD_FLAG_MMAP would still allow madvise() to work since
> GUEST_MEMFD_FLAG_DEFAULT_SHARED only gates faulting.
> 
> To clarify, I'm still for making GUEST_MEMFD_FLAG_DEFAULT_SHARED
> orthogonal to GUEST_MEMFD_FLAG_MMAP with no additional checks on top of
> whatever's in this patch. :)


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test
  2025-09-29 10:56   ` Ackerley Tng
@ 2025-09-29 16:58     ` Sean Christopherson
  2025-09-30  6:52       ` Ackerley Tng
  0 siblings, 1 reply; 55+ messages in thread
From: Sean Christopherson @ 2025-09-29 16:58 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

On Mon, Sep 29, 2025, Ackerley Tng wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > Use a global variable to track the host page size in the guest_memfd test
> > so that the information doesn't need to be constantly passed around.  The
> > state is purely a reflection of the underlying system, i.e. can't be set
> > by the test and is constant for a given invocation of the test, and thus
> > explicitly passing the host page size to individual testcases adds no
> > value, e.g. doesn't allow testing different combinations.
> >
> 
> I was going to pass in page_size to each of these test cases to test
> HugeTLB support, that's how page_size crept into the parameters of these
> functions.
> 
> Could we do a getpagesize() within the gmem_test() macro that you
> introduced instead?

We could, and I actually had it that way to start.  But I found that burying the
effective setting of page_size made it harder to see that it's a runtime constant,
versus something that can be configured by the test.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success
  2025-09-29 11:08   ` Ackerley Tng
@ 2025-09-29 17:32     ` Sean Christopherson
  2025-09-30  7:09       ` Ackerley Tng
  0 siblings, 1 reply; 55+ messages in thread
From: Sean Christopherson @ 2025-09-29 17:32 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

On Mon, Sep 29, 2025, Ackerley Tng wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > Add and use wrappers for mmap() and munmap() that assert success to reduce
> > a significant amount of boilerplate code, to ensure all tests assert on
> > failure, and to provide consistent error messages on failure.
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  .../testing/selftests/kvm/guest_memfd_test.c  | 21 +++------
> >  .../testing/selftests/kvm/include/kvm_util.h  | 25 +++++++++++
> >  tools/testing/selftests/kvm/lib/kvm_util.c    | 44 +++++++------------
> >  tools/testing/selftests/kvm/mmu_stress_test.c |  5 +--
> >  .../selftests/kvm/s390/ucontrol_test.c        | 16 +++----
> >  .../selftests/kvm/set_memory_region_test.c    | 17 ++++---
> >  6 files changed, 64 insertions(+), 64 deletions(-)
> >
> > 
> > [...snip...]
> > 
> > diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> > index 23a506d7eca3..1c68ff0fb3fb 100644
> > --- a/tools/testing/selftests/kvm/include/kvm_util.h
> > +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> > @@ -278,6 +278,31 @@ static inline bool kvm_has_cap(long cap)
> >  #define __KVM_SYSCALL_ERROR(_name, _ret) \
> >  	"%s failed, rc: %i errno: %i (%s)", (_name), (_ret), errno, strerror(errno)
> >  
> > +static inline void *__kvm_mmap(size_t size, int prot, int flags, int fd,
> > +			       off_t offset)
> 
> Do you have a policy/rationale for putting this in kvm_util.h as opposed
> to test_util.h? I like the idea of this wrapper but I thought this is
> less of a kvm thing and more of a test utility, and hence it belongs in
> test_util.c and test_util.h.

To be perfectly honest, I forgot test_util.h existed :-)

> Also, the name kind of associates mmap with KVM too closely IMO, but
> test_mmap() is not a great name either.

Which file will hopefully be irrevelant, because ideally it'll be temporary (see
below). But if someone has a strong opinion and/or better idea on the name prefix,
I definitely want to settle on a name for syscall wrappers, because I want to go
much further than just adding an mmap() wrapper.  I chose kvm_ because there's
basically zero chance that will ever conflict with generic selftests functionality,
and the wrappers utilize TEST_ASSERT(), which are unique to KVM selftests.

As for why the current location will hopefully be temporary, and why I want to
settle on a name, I have patches to add several more wrappers, along with
infrastructure to make it super easy to add new wrappers.  When trying to sort
out the libnuma stuff for Shivank's series[*], I discovered that KVM selftests
already has a (very partial, very crappy) libnuma equivalent in
tools/testing/selftests/kvm/include/numaif.h.

Adding wrappers for NUMA syscalls became an exercise in frustration (so much
uninteresting boilerplate, and I kept making silly mistakes), and so that combined
with the desire for mmap() and munmap() wrappers motivated me to add a macro
framework similar to the kernel's DEFINE_SYSCALL magic.

So, I've got patches (that I'll post with the next version of the gmem NUMA
series) that add tools/testing/selftests/kvm/include/kvm_syscalls.h, and
__kvm_mmap() will be moved there (ideally it wouldn't move, but I want to land
this small series in 6.18, and so wanted to keep the changes for 6.18 small-ish).

For lack of a better namespace, and because we already have __KVM_SYSCALL_ERROR(),
I picked KVM_SYSCALL_DEFINE() for the "standard" builder, e.g. libnuma equivalents,
and then __KVM_SYSCALL_DEFINE() for a KVM selftests specific version to handle
asserting success.

/* Define a kvm_<syscall>() API to assert success. */
#define __KVM_SYSCALL_DEFINE(name, nr_args, args...)			\
static inline void kvm_##name(DECLARE_ARGS(nr_args, args))		\
{									\
	int r;								\
									\
	r = name(UNPACK_ARGS(nr_args, args));				\
	TEST_ASSERT(!r, __KVM_SYSCALL_ERROR(#name, r));			\
}

/*
 * Macro to define syscall APIs, either because KVM selftests doesn't link to
 * the standard library, e.g. libnuma, or because there is no library that yet
 * provides the syscall.  These
 */
#define KVM_SYSCALL_DEFINE(name, nr_args, args...)			\
static inline long name(DECLARE_ARGS(nr_args, args))			\
{									\
	return syscall(__NR_##name, UNPACK_ARGS(nr_args, args));	\
}									\
__KVM_SYSCALL_DEFINE(name, nr_args, args)

The usage looks like this (which is odd at first glance, but makes it trivially
easy to copy+paste from the kernel SYSCALL_DEFINE invocations:

KVM_SYSCALL_DEFINE(get_mempolicy, 5, int *, policy, const unsigned long *, nmask,
		   unsigned long, maxnode, void *, addr, int, flags);

KVM_SYSCALL_DEFINE(set_mempolicy, 3, int, mode, const unsigned long *, nmask,
		   unsigned long, maxnode);

KVM_SYSCALL_DEFINE(set_mempolicy_home_node, 4, unsigned long, start,
		   unsigned long, len, unsigned long, home_node,
		   unsigned long, flags);

KVM_SYSCALL_DEFINE(migrate_pages, 4, int, pid, unsigned long, maxnode,
		   const unsigned long *, frommask, const unsigned long *, tomask);

KVM_SYSCALL_DEFINE(move_pages, 6, int, pid, unsigned long, count, void *, pages,
		   const int *, nodes, int *, status, int, flags);

KVM_SYSCALL_DEFINE(mbind, 6, void *, addr, unsigned long, size, int, mode,
		   const unsigned long *, nodemask, unsigned long, maxnode,
		   unsigned int, flags);

__KVM_SYSCALL_DEFINE(munmap, 2, void *, mem, size_t, size);
__KVM_SYSCALL_DEFINE(close, 1, int, fd);
__KVM_SYSCALL_DEFINE(fallocate, 4, int, fd, int, mode, loff_t, offset, loff_t, len);
__KVM_SYSCALL_DEFINE(ftruncate, 2, unsigned int, fd, off_t, length);

[*] https://lore.kernel.org/all/0e986bdb-7d1b-4c14-932e-771a87532947@amd.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails
  2025-09-29 14:38   ` Ackerley Tng
@ 2025-09-29 18:10     ` Sean Christopherson
  2025-09-29 18:35       ` Sean Christopherson
  2025-09-30  7:53       ` Ackerley Tng
  0 siblings, 2 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-29 18:10 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

On Mon, Sep 29, 2025, Ackerley Tng wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > Add a guest_memfd testcase to verify that faulting in private memory gets
> > a SIGBUS.  For now, test only the case where memory is private by default
> > since KVM doesn't yet support in-place conversion.
> >
> > Cc: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  .../testing/selftests/kvm/guest_memfd_test.c  | 62 ++++++++++++++-----
> >  1 file changed, 46 insertions(+), 16 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> > index 5dd40b77dc07..b5a631aca933 100644
> > --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> > +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> > @@ -40,17 +40,26 @@ static void test_file_read_write(int fd, size_t total_size)
> >  		    "pwrite on a guest_mem fd should fail");
> >  }
> >  
> 
> I feel that the tests should be grouped by concepts being tested
> 
> + test_cow_not_supported()
>     + mmap() should fail
> + test_mmap_supported()
>     + kvm_mmap()
>     + regular, successful accesses to offsets within the size of the fd
>     + kvm_munmap()
> + test_fault_overflow()
>     + kvm_mmap()
>     + a helper (perhaps "assert_fault_sigbus(char *mem)"?) that purely
>       tries to access beyond the size of the fd and catches SIGBUS
>     + regular, successful accesses to offsets within the size of the fd
>     + kvm_munmap()
> + test_fault_private()
>     + kvm_mmap()
>     + a helper (perhaps "assert_fault_sigbus(char *mem)"?) that purely
>       tries to access within the size of the fd and catches SIGBUS
>     + kvm_munmap()
> 
> I think some code duplication in tests is okay if it makes the test flow
> more obvious.

Yeah, depends on what is being duplicated, and how much.

> > -static void test_mmap_supported(int fd, size_t total_size)
> > +static void *test_mmap_common(int fd, size_t size)
> >  {
> > -	const char val = 0xaa;
> > -	char *mem;
> > -	size_t i;
> > -	int ret;
> > +	void *mem;
> >  
> > -	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> > +	mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> >  	TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
> >
> 
> When grouped this way, test_mmap_common() tests that MAP_PRIVATE or COW
> is not allowed twice, once in test_mmap_supported() and once in
> test_fault_sigbus(). Is that intentional?

Hmm, no?  I suspect I just lost track of what was being tested.

> > -	mem = kvm_mmap(total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> > +	mem = kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> > +
> > +	return mem;
> 
> I feel that returning (and using) the userspace address from a test
> (test_mmap_common()) is a little hard to follow.

Agreed.  Should be easy enough to eliminate this helper.

> > -static void test_fault_overflow(int fd, size_t total_size)
> > +static void *test_fault_sigbus(int fd, size_t size)
> >  {
> >  	struct sigaction sa_old, sa_new = {
> >  		.sa_handler = fault_sigbus_handler,
> >  	};
> > -	size_t map_size = total_size * 4;
> > -	const char val = 0xaa;
> > -	char *mem;
> > -	size_t i;
> > +	void *mem;
> >  
> > -	mem = kvm_mmap(map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> > +	mem = test_mmap_common(fd, size);
> >  
> >  	sigaction(SIGBUS, &sa_new, &sa_old);
> >  	if (sigsetjmp(jmpbuf, 1) == 0) {
> > -		memset(mem, 0xaa, map_size);
> > +		memset(mem, 0xaa, size);
> >  		TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
> >  	}
> >  	sigaction(SIGBUS, &sa_old, NULL);
> >  
> > +	return mem;
> 
> I think returning the userspace address from a test is a little hard to
> follow. This one feels even more unexpected because a valid address is
> being returned (and used) from a test that has sigbus in its name.

Yeah, and it's fugly all around.  If we pass in the "accessible" size, then we
can reduce the amount of copy+paste, eliminate the weird return and split mmap()
versus munmap(), and get bonus coverage that reads SIGBUS as well.

How's this look?

static void test_fault_sigbus(int fd, size_t accessible_size, size_t mmap_size)
{
	struct sigaction sa_old, sa_new = {
		.sa_handler = fault_sigbus_handler,
	};
	const uint8_t val = 0xaa;
	uint8_t *mem;
	size_t i;

	mem = kvm_mmap(mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);

	sigaction(SIGBUS, &sa_new, &sa_old);
	if (sigsetjmp(jmpbuf, 1) == 0) {
		memset(mem, val, mmap_size);
		TEST_FAIL("memset() should have triggered SIGBUS");
	}
	if (sigsetjmp(jmpbuf, 1) == 0) {
		(void)READ_ONCE(mem[accessible_size]);
		TEST_FAIL("load at first unaccessible byte should have triggered SIGBUS");
	}
	sigaction(SIGBUS, &sa_old, NULL);

	for (i = 0; i < accessible_size; i++)
		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);

	kvm_munmap(mem, mmap_size);
}

static void test_fault_overflow(int fd, size_t total_size)
{
	test_fault_sigbus(fd, total_size, total_size * 4);
}

static void test_fault_private(int fd, size_t total_size)
{
	test_fault_sigbus(fd, 0, total_size);
}

> > +static void test_fault_private(int fd, size_t total_size)
> > +{
> > +	void *mem = test_fault_sigbus(fd, total_size);
> > +
> > +	kvm_munmap(mem, total_size);
> > +}
> > +
> 
> Testing that faults fail when GUEST_MEMFD_FLAG_DEFAULT_SHARED is not set
> is a good idea. Perhaps it could be even clearer if further split up:
> 
> + test_mmap_supported()
>     + kvm_mmap()
>     + kvm_munmap()
> + test_mmap_supported_fault_supported()
>     + kvm_mmap()
>     + successful accesses to offsets within the size of the fd
>     + kvm_munmap()
> + test_mmap_supported_fault_sigbus()
>     + kvm_mmap()
>     + expect SIGBUS from accesses to offsets within the size of the fd
>     + kvm_munmap()
> 
> >  static void test_mmap_not_supported(int fd, size_t total_size)
> >  {
> >  	char *mem;
> > @@ -274,9 +299,12 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
> >  
> >  	gmem_test(file_read_write, vm, flags);
> >  
> > -	if (flags & GUEST_MEMFD_FLAG_MMAP) {
> > +	if (flags & GUEST_MEMFD_FLAG_MMAP &&
> > +	    flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
> >  		gmem_test(mmap_supported, vm, flags);
> >  		gmem_test(fault_overflow, vm, flags);
> > +	} else if (flags & GUEST_MEMFD_FLAG_MMAP) {
> > +		gmem_test(fault_private, vm, flags);
> 
> test_fault_private() makes me think the test is testing for private
> faults, but there's nothing private about this fault,

It's a user fault on private memory, not sure how else to describe that :-)
The CoCo shared vs. private and MAP_{SHARED,PRIVATE} collision is unfortunate,
but I think we should prioritize standardizing on CoCo shared vs. private since
that is what KVM will care about 99.9% of the time, i.e. in literally everything
except kvm_gmem_mmap().

> and the fault doesn't even come from the guest.

Sure, but I don't see what that has to do with anything, e.g. fault_overflow()
isn't a fault from the guest either.

> >  	} else {
> >  		gmem_test(mmap_not_supported, vm, flags);
> >  	}
> 
> If split up as described above, this could be
> 
> 	if (flags & GUEST_MEMFD_FLAG_MMAP &&
> 	    flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
> 		gmem_test(mmap_supported_fault_supported, vm, flags);
> 		gmem_test(fault_overflow, vm, flags);
> 	} else if (flags & GUEST_MEMFD_FLAG_MMAP) {
> 		gmem_test(mmap_supported_fault_sigbus, vm, flags);

I find these unintuitive, e.g. is this one "mmap() supported, test fault sigbus",
or is it "mmap(), test supported fault sigbus".  I also don't like that some of
the test names describe the _result_ (SIBGUS), where as others describe _what_
is being tested.

In general, I don't like test names that describe the result, because IMO what
is being tested is far more interesting.  E.g. from a test coverage persective,
I don't care if attempting to fault in (CoCO) private memory gets SIGBUS versus
SIGSEGV, but I most definitely care that we have test coverage for the "what".

Looking at everything, I think the only that doesn't fit well is the CoW
scenario.  What if we extract that to its own helper?  That would eliminate the
ugly test_mmap_common(), 

So my vote would be to keep things largely the same:

	if (flags & GUEST_MEMFD_FLAG_MMAP &&
	    flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
		gmem_test(mmap_supported, vm, flags);
		gmem_test(mmap_cow, vm, flags);
		gmem_test(fault_overflow, vm, flags);
		gmem_test(mbind, vm, flags);
		gmem_test(numa_allocation, vm, flags);
	} else if (flags & GUEST_MEMFD_FLAG_MMAP) {
		gmem_test(fault_private, vm, flags);
	} else {
		gmem_test(mmap_not_supported, vm, flags);
	}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails
  2025-09-29 18:10     ` Sean Christopherson
@ 2025-09-29 18:35       ` Sean Christopherson
  2025-09-30  7:53       ` Ackerley Tng
  1 sibling, 0 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-29 18:35 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

On Mon, Sep 29, 2025, Sean Christopherson wrote:
> How's this look?
> 
> static void test_fault_sigbus(int fd, size_t accessible_size, size_t mmap_size)
> {
> 	struct sigaction sa_old, sa_new = {
> 		.sa_handler = fault_sigbus_handler,
> 	};
> 	const uint8_t val = 0xaa;
> 	uint8_t *mem;
> 	size_t i;
> 
> 	mem = kvm_mmap(mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> 
> 	sigaction(SIGBUS, &sa_new, &sa_old);
> 	if (sigsetjmp(jmpbuf, 1) == 0) {
> 		memset(mem, val, mmap_size);
> 		TEST_FAIL("memset() should have triggered SIGBUS");
> 	}
> 	if (sigsetjmp(jmpbuf, 1) == 0) {
> 		(void)READ_ONCE(mem[accessible_size]);
> 		TEST_FAIL("load at first unaccessible byte should have triggered SIGBUS");
> 	}
> 	sigaction(SIGBUS, &sa_old, NULL);
> 
> 	for (i = 0; i < accessible_size; i++)
> 		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
> 
> 	kvm_munmap(mem, mmap_size);
> }
> 
> static void test_fault_overflow(int fd, size_t total_size)
> {
> 	test_fault_sigbus(fd, total_size, total_size * 4);
> }
> 
> static void test_fault_private(int fd, size_t total_size)
> {
> 	test_fault_sigbus(fd, 0, total_size);
> }

And if I don't wantonly change variable names/types, the diff is much cleaner:

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 8ed08be72c43..8e375de2d7d8 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -83,12 +83,11 @@ void fault_sigbus_handler(int signum)
        siglongjmp(jmpbuf, 1);
 }
 
-static void test_fault_overflow(int fd, size_t total_size)
+static void test_fault_sigbus(int fd, size_t accessible_size, size_t map_size)
 {
        struct sigaction sa_old, sa_new = {
                .sa_handler = fault_sigbus_handler,
        };
-       size_t map_size = total_size * 4;
        const char val = 0xaa;
        char *mem;
        size_t i;
@@ -102,12 +101,22 @@ static void test_fault_overflow(int fd, size_t total_size)
        }
        sigaction(SIGBUS, &sa_old, NULL);
 
-       for (i = 0; i < total_size; i++)
+       for (i = 0; i < accessible_size; i++)
                TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
 
        kvm_munmap(mem, map_size);
 }
 
+static void test_fault_overflow(int fd, size_t total_size)
+{
+       test_fault_sigbus(fd, total_size, total_size * 4);
+}
+
+static void test_fault_private(int fd, size_t total_size)
+{
+       test_fault_sigbus(fd, 0, total_size);
+}
+
 static void test_mmap_not_supported(int fd, size_t total_size)
 {
        char *mem;
@@ -279,10 +288,13 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
 
        gmem_test(file_read_write, vm, flags);
 
-       if (flags & GUEST_MEMFD_FLAG_MMAP) {
+       if (flags & GUEST_MEMFD_FLAG_MMAP &&
+           flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
                gmem_test(mmap_supported, vm, flags);
                gmem_test(mmap_cow, vm, flags);
                gmem_test(fault_overflow, vm, flags);
+       } else if (flags & GUEST_MEMFD_FLAG_MMAP) {
+               gmem_test(fault_private, vm, flags);
        } else {
                gmem_test(mmap_not_supported, vm, flags);
        }
@@ -300,9 +312,11 @@ static void test_guest_memfd(unsigned long vm_type)
 
        __test_guest_memfd(vm, 0);
 
-       if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
+       if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP)) {
+               __test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP);
                __test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP |
                                       GUEST_MEMFD_FLAG_DEFAULT_SHARED);
+       }
 
        kvm_vm_free(vm);
 }

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-29 16:55             ` Sean Christopherson
@ 2025-09-30  0:15               ` Sean Christopherson
  2025-09-30  8:36                 ` Ackerley Tng
  2025-10-01 14:22                 ` Vishal Annapurve
  0 siblings, 2 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-09-30  0:15 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: David Hildenbrand, Patrick Roy, Fuad Tabba, Paolo Bonzini,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
	linux-kernel, Nikita Kalyazin, shivankg

On Mon, Sep 29, 2025, Sean Christopherson wrote:
> On Mon, Sep 29, 2025, Ackerley Tng wrote:
> > David Hildenbrand <david@redhat.com> writes:
> > 
> > >                           GUEST_MEMFD_FLAG_DEFAULT_SHARED;
> > >>>>
> > >>>> At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
> > >>>> GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
> > >>>> checking for that, at least until we have in-place conversion? Having
> > >>>> only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
> > >>>> isn't a useful combination.
> > >>>>
> > >>>
> > >>> I think it's okay to have the two flags be orthogonal from the start.
> > >> 
> > >> I think I dimly remember someone at one of the guest_memfd syncs
> > >> bringing up a usecase for having a VMA even if all memory is private,
> > >> not for faulting anything in, but to do madvise or something? Maybe it
> > >> was the NUMA stuff? (+Shivank)
> > >
> > > Yes, that should be it. But we're never faulting in these pages, we only 
> > > need the VMA (for the time being, until there is the in-place conversion).
> > >
> > 
> > Yup, Sean's patch disables faulting if GUEST_MEMFD_FLAG_DEFAULT_SHARED
> > is not set, but mmap() is always enabled so madvise() still works.
> 
> Hah!  I totally intended that :-D
> 
> > Requiring GUEST_MEMFD_FLAG_DEFAULT_SHARED to be set together with
> > GUEST_MEMFD_FLAG_MMAP would still allow madvise() to work since
> > GUEST_MEMFD_FLAG_DEFAULT_SHARED only gates faulting.
> > 
> > To clarify, I'm still for making GUEST_MEMFD_FLAG_DEFAULT_SHARED
> > orthogonal to GUEST_MEMFD_FLAG_MMAP with no additional checks on top of
> > whatever's in this patch. :)

Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
KVM_CAP_GUEST_MEMFD_MMAP.  Two things:

 1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
    that we don't need to add a capability every time a new flag comes along,
    and so that userspace can gather all flags in a single ioctl.  If gmem ever
    supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
    that's a non-issue relatively speaking.

 2. We should allow mmap() for x86 CoCo VMs right away.  As evidenced by this
    series, mmap() on private memory is totally fine.  It's not useful until the
    NUMA and/or in-place conversion support comes along, but's not dangerous in
    any way.  The actual restriction is on initializing memory to be shared,
    because allowing memory to be shared from gmem's perspective while it's
    private from the VM's perspective would be all kinds of broken.


E.g. with a s/kvm_arch_supports_gmem_mmap/kvm_arch_supports_gmem_init_shared:

	case KVM_CAP_GUEST_MEMFD_FLAGS:
		if (!kvm || kvm_arch_supports_init_shared(kvm))
			return GUEST_MEMFD_FLAG_MMAP |
			       GUEST_MEMFD_FLAG_INIT_SHARED;

		return GUEST_MEMFD_FLAG_MMAP;

#2 is also a good reason to add INIT_SHARED straightaway.  Without INIT_SHARED,
we'd have to INIT_PRIVATE to make the NUMA support useful for x86 CoCo VMs, i.e.
it's not just in-place conversion that's affected, IIUC.

I'll add this in v2.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test
  2025-09-29 16:58     ` Sean Christopherson
@ 2025-09-30  6:52       ` Ackerley Tng
  0 siblings, 0 replies; 55+ messages in thread
From: Ackerley Tng @ 2025-09-30  6:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Sep 29, 2025, Ackerley Tng wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > Use a global variable to track the host page size in the guest_memfd test
>> > so that the information doesn't need to be constantly passed around.  The
>> > state is purely a reflection of the underlying system, i.e. can't be set
>> > by the test and is constant for a given invocation of the test, and thus
>> > explicitly passing the host page size to individual testcases adds no
>> > value, e.g. doesn't allow testing different combinations.
>> >
>> 
>> I was going to pass in page_size to each of these test cases to test
>> HugeTLB support, that's how page_size crept into the parameters of these
>> functions.
>> 
>> Could we do a getpagesize() within the gmem_test() macro that you
>> introduced instead?
>
> We could, and I actually had it that way to start.  But I found that burying the
> effective setting of page_size made it harder to see that it's a runtime constant,
> versus something that can be configured by the test.

I guess I could also just update the global static variable page_size
for HugeTLB tests since we won't be running tests with different page
sizes in parallel. Maybe that's better, actually.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success
  2025-09-29 17:32     ` Sean Christopherson
@ 2025-09-30  7:09       ` Ackerley Tng
  2025-09-30 14:24         ` Sean Christopherson
  0 siblings, 1 reply; 55+ messages in thread
From: Ackerley Tng @ 2025-09-30  7:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Sep 29, 2025, Ackerley Tng wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > Add and use wrappers for mmap() and munmap() that assert success to reduce
>> > a significant amount of boilerplate code, to ensure all tests assert on
>> > failure, and to provide consistent error messages on failure.
>> >
>> > No functional change intended.
>> >
>> > Signed-off-by: Sean Christopherson <seanjc@google.com>
>> > ---
>> >  .../testing/selftests/kvm/guest_memfd_test.c  | 21 +++------
>> >  .../testing/selftests/kvm/include/kvm_util.h  | 25 +++++++++++
>> >  tools/testing/selftests/kvm/lib/kvm_util.c    | 44 +++++++------------
>> >  tools/testing/selftests/kvm/mmu_stress_test.c |  5 +--
>> >  .../selftests/kvm/s390/ucontrol_test.c        | 16 +++----
>> >  .../selftests/kvm/set_memory_region_test.c    | 17 ++++---
>> >  6 files changed, 64 insertions(+), 64 deletions(-)
>> >
>> > 
>> > [...snip...]
>> > 
>> > diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
>> > index 23a506d7eca3..1c68ff0fb3fb 100644
>> > --- a/tools/testing/selftests/kvm/include/kvm_util.h
>> > +++ b/tools/testing/selftests/kvm/include/kvm_util.h
>> > @@ -278,6 +278,31 @@ static inline bool kvm_has_cap(long cap)
>> >  #define __KVM_SYSCALL_ERROR(_name, _ret) \
>> >  	"%s failed, rc: %i errno: %i (%s)", (_name), (_ret), errno, strerror(errno)
>> >  
>> > +static inline void *__kvm_mmap(size_t size, int prot, int flags, int fd,
>> > +			       off_t offset)
>> 
>> Do you have a policy/rationale for putting this in kvm_util.h as opposed
>> to test_util.h? I like the idea of this wrapper but I thought this is
>> less of a kvm thing and more of a test utility, and hence it belongs in
>> test_util.c and test_util.h.
>
> To be perfectly honest, I forgot test_util.h existed :-)
>

Merging/dropping one of kvm_util.h vs test_util.h is a good idea. The
distinction is not clear and it's already kind of messy between the two.

>> Also, the name kind of associates mmap with KVM too closely IMO, but
>> test_mmap() is not a great name either.
>
> Which file will hopefully be irrevelant, because ideally it'll be temporary (see
> below). But if someone has a strong opinion and/or better idea on the name prefix,
> I definitely want to settle on a name for syscall wrappers, because I want to go
> much further than just adding an mmap() wrapper.  I chose kvm_ because there's
> basically zero chance that will ever conflict with generic selftests functionality,
> and the wrappers utilize TEST_ASSERT(), which are unique to KVM selftests.
>
> As for why the current location will hopefully be temporary, and why I want to
> settle on a name, I have patches to add several more wrappers, along with
> infrastructure to make it super easy to add new wrappers.  When trying to sort
> out the libnuma stuff for Shivank's series[*], I discovered that KVM selftests
> already has a (very partial, very crappy) libnuma equivalent in
> tools/testing/selftests/kvm/include/numaif.h.
>
> Adding wrappers for NUMA syscalls became an exercise in frustration (so much
> uninteresting boilerplate, and I kept making silly mistakes), and so that combined
> with the desire for mmap() and munmap() wrappers motivated me to add a macro
> framework similar to the kernel's DEFINE_SYSCALL magic.
>
> So, I've got patches (that I'll post with the next version of the gmem NUMA
> series) that add tools/testing/selftests/kvm/include/kvm_syscalls.h, and
> __kvm_mmap() will be moved there (ideally it wouldn't move, but I want to land
> this small series in 6.18, and so wanted to keep the changes for 6.18 small-ish).
>
> For lack of a better namespace, and because we already have __KVM_SYSCALL_ERROR(),
> I picked KVM_SYSCALL_DEFINE() for the "standard" builder, e.g. libnuma equivalents,
> and then __KVM_SYSCALL_DEFINE() for a KVM selftests specific version to handle
> asserting success.
>

It's a common pattern in KVM selftests to have a syscall/ioctl wrapper
foo() that asserts defaults and a __foo() that doesn't assert anything
and allows tests to assert something else, but I have a contrary
opinion.

I think it's better that tests be explicit about what they're testing
for, so perhaps it's better to use macros like TEST_ASSERT_EQ() to
explicitly call a function and check the results.

Or perhaps it should be more explicit, like in the name, that an
assertion is made within this function?

In many cases a foo() exists without the corresponding __foo(), which
seems to be discouraging testing for error cases.

Also, I guess especially for vcpu_run(), tests would like to loop/take
different actions based on different errnos and then it gets a bit
unwieldy to have to avoid functions that have assertions within them.

I can see people forgetting to add TEST_ASSERT_EQ()s to check results of
setup/teardown functions but I think those errors would surface some
other way anyway.

Not a strongly-held opinion, and no major concerns on the naming
either. It's a selftest after all and IIUC we're okay to have selftest
interfaces change anyway?

> /* Define a kvm_<syscall>() API to assert success. */
> #define __KVM_SYSCALL_DEFINE(name, nr_args, args...)			\
> static inline void kvm_##name(DECLARE_ARGS(nr_args, args))		\
> {									\
> 	int r;								\
> 									\
> 	r = name(UNPACK_ARGS(nr_args, args));				\
> 	TEST_ASSERT(!r, __KVM_SYSCALL_ERROR(#name, r));			\
> }
>
> /*
>  * Macro to define syscall APIs, either because KVM selftests doesn't link to
>  * the standard library, e.g. libnuma, or because there is no library that yet
>  * provides the syscall.  These
>  */
> #define KVM_SYSCALL_DEFINE(name, nr_args, args...)			\
> static inline long name(DECLARE_ARGS(nr_args, args))			\
> {									\
> 	return syscall(__NR_##name, UNPACK_ARGS(nr_args, args));	\
> }									\
> __KVM_SYSCALL_DEFINE(name, nr_args, args)
>
>
> The usage looks like this (which is odd at first glance, but makes it trivially
> easy to copy+paste from the kernel SYSCALL_DEFINE invocations:
>
> KVM_SYSCALL_DEFINE(get_mempolicy, 5, int *, policy, const unsigned long *, nmask,
> 		   unsigned long, maxnode, void *, addr, int, flags);
>
> KVM_SYSCALL_DEFINE(set_mempolicy, 3, int, mode, const unsigned long *, nmask,
> 		   unsigned long, maxnode);
>
> KVM_SYSCALL_DEFINE(set_mempolicy_home_node, 4, unsigned long, start,
> 		   unsigned long, len, unsigned long, home_node,
> 		   unsigned long, flags);
>
> KVM_SYSCALL_DEFINE(migrate_pages, 4, int, pid, unsigned long, maxnode,
> 		   const unsigned long *, frommask, const unsigned long *, tomask);
>
> KVM_SYSCALL_DEFINE(move_pages, 6, int, pid, unsigned long, count, void *, pages,
> 		   const int *, nodes, int *, status, int, flags);
>
> KVM_SYSCALL_DEFINE(mbind, 6, void *, addr, unsigned long, size, int, mode,
> 		   const unsigned long *, nodemask, unsigned long, maxnode,
> 		   unsigned int, flags);
>
> __KVM_SYSCALL_DEFINE(munmap, 2, void *, mem, size_t, size);
> __KVM_SYSCALL_DEFINE(close, 1, int, fd);
> __KVM_SYSCALL_DEFINE(fallocate, 4, int, fd, int, mode, loff_t, offset, loff_t, len);
> __KVM_SYSCALL_DEFINE(ftruncate, 2, unsigned int, fd, off_t, length);
>
> [*] https://lore.kernel.org/all/0e986bdb-7d1b-4c14-932e-771a87532947@amd.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails
  2025-09-29 18:10     ` Sean Christopherson
  2025-09-29 18:35       ` Sean Christopherson
@ 2025-09-30  7:53       ` Ackerley Tng
  2025-09-30 14:58         ` Sean Christopherson
  1 sibling, 1 reply; 55+ messages in thread
From: Ackerley Tng @ 2025-09-30  7:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Sep 29, 2025, Ackerley Tng wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> 
>> [...snip...]
>> 
>> > -static void test_fault_overflow(int fd, size_t total_size)
>> > +static void *test_fault_sigbus(int fd, size_t size)
>> >  {
>> >  	struct sigaction sa_old, sa_new = {
>> >  		.sa_handler = fault_sigbus_handler,
>> >  	};
>> > -	size_t map_size = total_size * 4;
>> > -	const char val = 0xaa;
>> > -	char *mem;
>> > -	size_t i;
>> > +	void *mem;
>> >  
>> > -	mem = kvm_mmap(map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
>> > +	mem = test_mmap_common(fd, size);
>> >  
>> >  	sigaction(SIGBUS, &sa_new, &sa_old);
>> >  	if (sigsetjmp(jmpbuf, 1) == 0) {
>> > -		memset(mem, 0xaa, map_size);
>> > +		memset(mem, 0xaa, size);
>> >  		TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
>> >  	}
>> >  	sigaction(SIGBUS, &sa_old, NULL);
>> >  
>> > +	return mem;
>> 
>> I think returning the userspace address from a test is a little hard to
>> follow. This one feels even more unexpected because a valid address is
>> being returned (and used) from a test that has sigbus in its name.
>
> Yeah, and it's fugly all around.  If we pass in the "accessible" size, then we
> can reduce the amount of copy+paste, eliminate the weird return and split mmap()
> versus munmap(), and get bonus coverage that reads SIGBUS as well.
>
> How's this look?
>
> static void test_fault_sigbus(int fd, size_t accessible_size, size_t mmap_size)
> {
> 	struct sigaction sa_old, sa_new = {
> 		.sa_handler = fault_sigbus_handler,
> 	};
> 	const uint8_t val = 0xaa;
> 	uint8_t *mem;
> 	size_t i;
>
> 	mem = kvm_mmap(mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
>
> 	sigaction(SIGBUS, &sa_new, &sa_old);
> 	if (sigsetjmp(jmpbuf, 1) == 0) {
> 		memset(mem, val, mmap_size);
> 		TEST_FAIL("memset() should have triggered SIGBUS");
> 	}
> 	if (sigsetjmp(jmpbuf, 1) == 0) {
> 		(void)READ_ONCE(mem[accessible_size]);
> 		TEST_FAIL("load at first unaccessible byte should have triggered SIGBUS");
> 	}
> 	sigaction(SIGBUS, &sa_old, NULL);
>
> 	for (i = 0; i < accessible_size; i++)
> 		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
>
> 	kvm_munmap(mem, mmap_size);
> }
>
> static void test_fault_overflow(int fd, size_t total_size)
> {
> 	test_fault_sigbus(fd, total_size, total_size * 4);
> }
>

Is it intentional that the same SIGBUS on offset mem + total_size is
triggered twice? The memset would have worked fine until offset mem +
total_size, which is the same SIGBUS case as mem[accessible_size]. Or
was it meant to test that both read and write trigger SIGBUS?

> static void test_fault_private(int fd, size_t total_size)
> {
> 	test_fault_sigbus(fd, 0, total_size);
> }
>

I would prefer more unrolling to avoid mental hoops within test code,
perhaps like (not compile tested):

static void assert_host_fault_sigbus(uint8_t *mem) 
{
 	struct sigaction sa_old, sa_new = {
 		.sa_handler = fault_sigbus_handler,
 	};

 	sigaction(SIGBUS, &sa_new, &sa_old);
 	if (sigsetjmp(jmpbuf, 1) == 0) {
 		(void)READ_ONCE(*mem);
 		TEST_FAIL("Reading %p should have triggered SIGBUS", mem);
 	}
        sigaction(SIGBUS, &sa_old, NULL);
}

static void test_fault_overflow(int fd, size_t total_size)
{
	uint8_t *mem = kvm_mmap(total_size * 2, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
        int i;

 	for (i = 0; i < total_size; i++)
 		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);

        assert_host_fault_sigbus(mem + total_size);

        kvm_munmap(mem, mmap_size);
}

static void test_fault_private(int fd, size_t total_size)
{
	uint8_t *mem = kvm_mmap(total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
        int i;

        assert_host_fault_sigbus(mem);

        kvm_munmap(mem, mmap_size);
}

assert_host_fault_sigbus() can then be flexibly reused for conversion
tests (coming up) at various offsets from the mmap()-ed addresses.

At some point, sigaction, sigsetjmp, etc could perhaps even be further
wrapped. For testing memory_failure() for guest_memfd we will want to
check for SIGBUS on memory failure injection instead of on host fault.

Would be nice if it looked like this (maybe not in this patch series):

+ TEST_ASSERT_WILL_SIGBUS(READ_ONCE(mem[i]))
+ TEST_ASSERT_WILL_SIGBUS(WRITE_ONCE(mem[i]))
+ TEST_ASSERT_WILL_SIGBUS(madvise(MADV_HWPOISON))

>> > +static void test_fault_private(int fd, size_t total_size)
>> > +{
>> > +	void *mem = test_fault_sigbus(fd, total_size);
>> > +
>> > +	kvm_munmap(mem, total_size);
>> > +}
>> > +
>> 
>> Testing that faults fail when GUEST_MEMFD_FLAG_DEFAULT_SHARED is not set
>> is a good idea. Perhaps it could be even clearer if further split up:
>> 
>> + test_mmap_supported()
>>     + kvm_mmap()
>>     + kvm_munmap()
>> + test_mmap_supported_fault_supported()
>>     + kvm_mmap()
>>     + successful accesses to offsets within the size of the fd
>>     + kvm_munmap()
>> + test_mmap_supported_fault_sigbus()
>>     + kvm_mmap()
>>     + expect SIGBUS from accesses to offsets within the size of the fd
>>     + kvm_munmap()
>> 
>> >  static void test_mmap_not_supported(int fd, size_t total_size)
>> >  {
>> >  	char *mem;
>> > @@ -274,9 +299,12 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
>> >  
>> >  	gmem_test(file_read_write, vm, flags);
>> >  
>> > -	if (flags & GUEST_MEMFD_FLAG_MMAP) {
>> > +	if (flags & GUEST_MEMFD_FLAG_MMAP &&
>> > +	    flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
>> >  		gmem_test(mmap_supported, vm, flags);
>> >  		gmem_test(fault_overflow, vm, flags);
>> > +	} else if (flags & GUEST_MEMFD_FLAG_MMAP) {
>> > +		gmem_test(fault_private, vm, flags);
>> 
>> test_fault_private() makes me think the test is testing for private
>> faults, but there's nothing private about this fault,
>
> It's a user fault on private memory, not sure how else to describe that :-)
> The CoCo shared vs. private and MAP_{SHARED,PRIVATE} collision is unfortunate,
> but I think we should prioritize standardizing on CoCo shared vs. private since
> that is what KVM will care about 99.9% of the time, i.e. in literally everything
> except kvm_gmem_mmap().
>
>> and the fault doesn't even come from the guest.
>
> Sure, but I don't see what that has to do with anything, e.g. fault_overflow()
> isn't a fault from the guest either.
>

Maybe it's the frame of mind I'm working in (conversions), where all
private faults must be from the guest or from KVM. Feel free to ignore this.

>> >  	} else {
>> >  		gmem_test(mmap_not_supported, vm, flags);
>> >  	}
>> 
>> If split up as described above, this could be
>> 
>> 	if (flags & GUEST_MEMFD_FLAG_MMAP &&
>> 	    flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
>> 		gmem_test(mmap_supported_fault_supported, vm, flags);
>> 		gmem_test(fault_overflow, vm, flags);
>> 	} else if (flags & GUEST_MEMFD_FLAG_MMAP) {
>> 		gmem_test(mmap_supported_fault_sigbus, vm, flags);
>
> I find these unintuitive, e.g. is this one "mmap() supported, test fault sigbus",
> or is it "mmap(), test supported fault sigbus".  I also don't like that some of
> the test names describe the _result_ (SIBGUS), where as others describe _what_
> is being tested.
>

I think of the result (SIGBUS) as part of what's being tested. So
test_supported_fault_sigbus() is testing that mmap is supported, and
faulting will result in a SIGBUS.

> In general, I don't like test names that describe the result, because IMO what
> is being tested is far more interesting.  E.g. from a test coverage persective,
> I don't care if attempting to fault in (CoCO) private memory gets SIGBUS versus
> SIGSEGV, but I most definitely care that we have test coverage for the "what".
>

The SIGBUS is part of the contract with userspace and that's also part
of what's being tested IMO.

That said, I agree we don't need sigbus in the name, I guess I just
meant that there are a few layers to test here and I couldn't find a
better name:

1. mmap() succeeds to start with
2. mmap() succeeds, and faulting also succeeds
    + mmap() works, and faulting does not succeed because memory is not
      intended to be accessible to the host
3. mmap() succeed, and faulting also succeeds, but only within the size of
   guest_memfd

> Looking at everything, I think the only that doesn't fit well is the CoW
> scenario.  What if we extract that to its own helper?  That would eliminate the
> ugly test_mmap_common(), 
>

Extracting the CoW scenario is good, thanks!

> So my vote would be to keep things largely the same:
>
> 	if (flags & GUEST_MEMFD_FLAG_MMAP &&
> 	    flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
> 		gmem_test(mmap_supported, vm, flags);
> 		gmem_test(mmap_cow, vm, flags);
> 		gmem_test(fault_overflow, vm, flags);
> 		gmem_test(mbind, vm, flags);
> 		gmem_test(numa_allocation, vm, flags);
> 	} else if (flags & GUEST_MEMFD_FLAG_MMAP) {
> 		gmem_test(fault_private, vm, flags);
> 	} else {
> 		gmem_test(mmap_not_supported, vm, flags);
> 	}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-30  0:15               ` Sean Christopherson
@ 2025-09-30  8:36                 ` Ackerley Tng
  2025-10-01 14:22                 ` Vishal Annapurve
  1 sibling, 0 replies; 55+ messages in thread
From: Ackerley Tng @ 2025-09-30  8:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Hildenbrand, Patrick Roy, Fuad Tabba, Paolo Bonzini,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
	linux-kernel, Nikita Kalyazin, shivankg

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Sep 29, 2025, Sean Christopherson wrote:
>> On Mon, Sep 29, 2025, Ackerley Tng wrote:
>> > David Hildenbrand <david@redhat.com> writes:
>> > 
>> > >                           GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>> > >>>>
>> > >>>> At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
>> > >>>> GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
>> > >>>> checking for that, at least until we have in-place conversion? Having
>> > >>>> only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
>> > >>>> isn't a useful combination.
>> > >>>>
>> > >>>
>> > >>> I think it's okay to have the two flags be orthogonal from the start.
>> > >> 
>> > >> I think I dimly remember someone at one of the guest_memfd syncs
>> > >> bringing up a usecase for having a VMA even if all memory is private,
>> > >> not for faulting anything in, but to do madvise or something? Maybe it
>> > >> was the NUMA stuff? (+Shivank)
>> > >
>> > > Yes, that should be it. But we're never faulting in these pages, we only 
>> > > need the VMA (for the time being, until there is the in-place conversion).
>> > >
>> > 
>> > Yup, Sean's patch disables faulting if GUEST_MEMFD_FLAG_DEFAULT_SHARED
>> > is not set, but mmap() is always enabled so madvise() still works.
>> 
>> Hah!  I totally intended that :-D
>> 
>> > Requiring GUEST_MEMFD_FLAG_DEFAULT_SHARED to be set together with
>> > GUEST_MEMFD_FLAG_MMAP would still allow madvise() to work since
>> > GUEST_MEMFD_FLAG_DEFAULT_SHARED only gates faulting.
>> > 
>> > To clarify, I'm still for making GUEST_MEMFD_FLAG_DEFAULT_SHARED
>> > orthogonal to GUEST_MEMFD_FLAG_MMAP with no additional checks on top of
>> > whatever's in this patch. :)
>
> Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
>
>  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
>     that we don't need to add a capability every time a new flag comes along,
>     and so that userspace can gather all flags in a single ioctl.  If gmem ever
>     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
>     that's a non-issue relatively speaking.
>

This is a good idea. In my internal WIP series I have 3 flags and 4
CAPs, lol. Some of those CAPs are not for new flags, though.

Would like to check your rationale for future reference: how about
generalizing beyong flags and having KVM_CAP_GUEST_MEMFD_CAPS which
returns 32 bits, one bit for every guest_memfd-related (not necessarily
flags-related) cap?

>  2. We should allow mmap() for x86 CoCo VMs right away.  As evidenced by this
>     series, mmap() on private memory is totally fine.  It's not useful until the
>     NUMA and/or in-place conversion support comes along, but's not dangerous in
>     any way.  The actual restriction is on initializing memory to be shared,

The actual restriction is that private memory must not be mapped to host
userspace, so it's not really about initializing, though before
conversion, initialization state is the only state.

With GUEST_MEMFD_FLAG_INIT_SHARED, the entire guest_memfd is shared and
mappable; without GUEST_MEMFD_FLAG_INIT_SHARED the entire guest_memfd is
private and not mappable (gated in kvm_gmem_fault_user_mapping()).

So yes, I agree that CoCo VMs should be allowed mmap() but not
GUEST_MEMFD_FLAG_INIT_SHARED, since GUEST_MEMFD_FLAG_INIT_SHARED makes
the entire guest_memfd take the shared state for the lifetime of
guest_memfd.

This is turning out to be a much nicer cleanup :)

>     because allowing memory to be shared from gmem's perspective while it's
>     private from the VM's perspective would be all kinds of broken.
>
>
> E.g. with a s/kvm_arch_supports_gmem_mmap/kvm_arch_supports_gmem_init_shared:
>
> 	case KVM_CAP_GUEST_MEMFD_FLAGS:
> 		if (!kvm || kvm_arch_supports_init_shared(kvm))
> 			return GUEST_MEMFD_FLAG_MMAP |
> 			       GUEST_MEMFD_FLAG_INIT_SHARED;
>
> 		return GUEST_MEMFD_FLAG_MMAP;
>

You might end up with this while actually coding v2 up, but how about

	case KVM_CAP_GUEST_MEMFD_FLAGS: {
        	int flag_caps = GUEST_MEMFD_FLAG_MMAP;
                
		if (!kvm || kvm_arch_supports_init_shared(kvm))
			flag_caps |= GUEST_MEMFD_FLAG_INIT_SHARED;

		return flag_caps;
	}

Then all the new non-optional CAPs can be or-ed onto flag_caps from the
start.
        
> #2 is also a good reason to add INIT_SHARED straightaway.  Without INIT_SHARED,
> we'd have to INIT_PRIVATE to make the NUMA support useful for x86 CoCo VMs, i.e.
> it's not just in-place conversion that's affected, IIUC.
>
> I'll add this in v2.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success
  2025-09-30  7:09       ` Ackerley Tng
@ 2025-09-30 14:24         ` Sean Christopherson
  2025-10-01 10:18           ` Ackerley Tng
  0 siblings, 1 reply; 55+ messages in thread
From: Sean Christopherson @ 2025-09-30 14:24 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

On Tue, Sep 30, 2025, Ackerley Tng wrote:
> Sean Christopherson <seanjc@google.com> writes:
> > To be perfectly honest, I forgot test_util.h existed :-)
> 
> Merging/dropping one of kvm_util.h vs test_util.h is a good idea. The
> distinction is not clear and it's already kind of messy between the two.

That's a topic for another day.

> It's a common pattern in KVM selftests to have a syscall/ioctl wrapper
> foo() that asserts defaults and a __foo() that doesn't assert anything
> and allows tests to assert something else, but I have a contrary
> opinion.
> 
> I think it's better that tests be explicit about what they're testing
> for, so perhaps it's better to use macros like TEST_ASSERT_EQ() to
> explicitly call a function and check the results.

No, foo() and __foo() is a well-established pattern in the kernel, and in KVM
selftests it is a very well-established pattern for syscalls and ioctls.  And
I feel very, very strong about handling errors in the core infrastructure.

Relying on developers to remember to add an assert is 100% guaranteed to result
in missed asserts.  That makes everyone's life painful, because inevitably an
ioctl will fail on someone else's system, and then they're stuck debugging a
super random failure with no insight into what the developer _meant_ to do.

And requiring developers to write (i.e. copy+paste) boring, uninteresting code
to handle failures adds a lot of friction to development, is a terrible use of
developers' time, and results in _awful_ error messages.  Bad or missing error
messages in tests have easily wasted tens of hours of just _my_ time; I suspect
the total cost throughout the KVM community can be measured in tens of days.

E.g. pop quiz, what state did I clobber that generated this error message with
a TEST_ASSERT_EQ(ret, 0)?  Answer at the bottom.

  ==== Test Assertion Failure ====
  lib/x86/processor.c:1128: ret == 0
  pid=2456 tid=2456 errno=22 - Invalid argument
     1	0x0000000000415465: vcpu_load_state at processor.c:1128
     2	0x0000000000402805: save_restore_vm at hyperv_evmcs.c:221
     3	0x000000000040204d: main at hyperv_evmcs.c:286
     4	0x000000000041df43: __libc_start_call_main at libc-start.o:?
     5	0x00000000004200ec: __libc_start_main_impl at ??:?
     6	0x0000000000402220: _start at ??:?
  0xffffffffffffffff != 0 (ret != 0)

You might say "oh, I can go look at the source".  But what if you don't have the
source because you got a test failure from CI?  Or because the assert came from
a bug report due to a failure in someone else's CI pipeline?

That is not a contrived example.  Before the ioctl assertion framework was added,
KVM selftests was littered with such garbage.  Note, I'm not blaming developers
in any way.  After having to add tens of asserts on KVM ioctls just to write a
simple test, it's entirely natural to become fatigued and start throwing in
TEST_ASSERT_EQ(ret, 0) or TEST_ASSERT(!ret, "ioctl failed").

There's also the mechanics of requiring the caller to assert.  KVM ioctls that
return a single value, e.g. register accessors, then need to use an out-param to
communicate the value or error code, e.g. this

	val = vcpu_get_reg(vcpu, reg_id);
	TEST_ASSERT_EQ(val, 0);

would become this:

	ret = vcpu_get_reg(vcpu, reg_id, &val);
	TEST_ASSERT_EQ(ret, 0);
	TEST_ASSERT_EQ(val, 0);

But of course, the developer would bundle that into:

	TEST_ASSERT(!ret && !val, "get_reg failed");

And then the user is really sad when the "!val" condition fails, because they
can't even tell.  Again, this't a contrived example, it literally happend to me
when dealing with the guest_memfd NUMA testcase, and was what prompted me to
write this syscall framework.  This also shows the typical error message that a
developer will write. 

This TEST_ASSERT() failed on me due to a misguided cleanup I made:

	ret = syscall(__NR_get_mempolicy, &get_policy, &get_nodemask,
		      maxnode, mem, MPOL_F_ADDR);
	TEST_ASSERT(!ret && get_policy == MPOL_DEFAULT && get_nodemask == 0,
		"Policy should be MPOL_DEFAULT and nodes zero");

generating this error message:

  ==== Test Assertion Failure ====
  guest_memfd_test.c:120: !ret && get_policy == MPOL_DEFAULT && get_nodemask == 0
  pid=52062 tid=52062 errno=22 - Invalid argument
     1	0x0000000000404113: test_mbind at guest_memfd_test.c:120 (discriminator 6)
     2	 (inlined by) __test_guest_memfd at guest_memfd_test.c:409 (discriminator 6)
     3	0x0000000000402320: test_guest_memfd at guest_memfd_test.c:432
     4	 (inlined by) main at guest_memfd_test.c:529
     5	0x000000000041eda3: __libc_start_call_main at libc-start.o:?
     6	0x0000000000420f4c: __libc_start_main_impl at ??:?
     7	0x00000000004025c0: _start at ??:?
  Policy should be MPOL_DEFAULT and nodes zero

At first glance, it would appear that get_mempolicy() failed with -EINVAL.  Nope.
ret==0, but errno was left set from an earlier syscall.  It took me a few minutes
of digging and a run with strace to figure out that get_mempolicy() succeeded.

Constrast that with:

        kvm_get_mempolicy(&policy, &nodemask, maxnode, mem, MPOL_F_ADDR);
        TEST_ASSERT(policy == MPOL_DEFAULT && !nodemask,
                    "Wanted MPOL_DEFAULT (%u) and nodemask 0x0, got %u and 0x%lx",
                    MPOL_DEFAULT, policy, nodemask);

  ==== Test Assertion Failure ====
  guest_memfd_test.c:120: policy == MPOL_DEFAULT && !nodemask
  pid=52700 tid=52700 errno=22 - Invalid argument
     1	0x0000000000404915: test_mbind at guest_memfd_test.c:120 (discriminator 6)
     2	 (inlined by) __test_guest_memfd at guest_memfd_test.c:407 (discriminator 6)
     3	0x0000000000402320: test_guest_memfd at guest_memfd_test.c:430
     4	 (inlined by) main at guest_memfd_test.c:527
     5	0x000000000041eda3: __libc_start_call_main at libc-start.o:?
     6	0x0000000000420f4c: __libc_start_main_impl at ??:?
     7	0x00000000004025c0: _start at ??:?
  Wanted MPOL_DEFAULT (0) and nodemask 0x0, got 1 and 0x1

Yeah, there's still some noise with errno=22, but it's fairly clear that the
returned values mismatches, and super obvious that the syscall succeeded when
looking at the code.  This is not a cherry-picked example.  There are hundreds,
if not thousands, of such asserts in KVM selftests and KVM-Unit-Tests in
particular.  And that's when developers _aren't_ forced to manually add boilerplate
asserts in ioctls succeeding.

For people that are completely new to KVM selftests, I can appreciate that it
might take a while to acclimate to the foo() and __foo() pattern, but I have a
hard time believing that it adds significant cognitive load after you've spent
a decent amount of time in KVM selftests.  And I 100% want to cater to the people
that are dealing with KVM selftests day in, day out.

> Or perhaps it should be more explicit, like in the name, that an
> assertion is made within this function?

No, that's entirely inflexible, will lead to confusion, and adds a copious amount
of noise.  E.g. this

	/* emulate hypervisor clearing CR4.OSXSAVE */
	vcpu_sregs_get(vcpu, &sregs);
	sregs.cr4 &= ~X86_CR4_OSXSAVE;
	vcpu_sregs_set(vcpu, &sregs);

versus

	/* emulate hypervisor clearing CR4.OSXSAVE */
	vcpu_sregs_get_assert(vcpu, &sregs);
	sregs.cr4 &= ~X86_CR4_OSXSAVE;
	vcpu_sregs_set_assert(vcpu, &sregs);

The "assert" is pure noise and makes it harder to see the "get" versus "set".

If we instead annotate the the "no_assert" case, then we'll end up with ambigous
cases where a developer won't be able to determine if an unannotated API asserts
or not, and conflict cases where a "no_assert" API _does_ assert, just not on the
primary ioctl it's invoking.

IMO, foo() and __foo() is quite explicit once you become accustomed to the
environment.

> In many cases a foo() exists without the corresponding __foo(), which
> seems to be discouraging testing for error cases.

That's almost always because no one has needed __foo().

> Also, I guess especially for vcpu_run(), tests would like to loop/take
> different actions based on different errnos and then it gets a bit
> unwieldy to have to avoid functions that have assertions within them.

vcpu_run() is a special case.  KVM_RUN is so much more than a normal ioctl, and
so having vcpu_run() follow the "standard" pattern isn't entirely feasible.

Speaking of vcpu_run(), and directly related to idea of having developers manually
do TEST_ASSERT_EQ(), one of the top items on my selftests todo list is to have
vcpu_run() handle GUEST_ASSERT and GUEST_PRINTF whenever possible.  Having to add
UCALL_PRINTF handling just to get a debug message out of a test's guest code is
beyond frustrating.  Ditto for the 60+ tests that had to manually add UCALL_ABORT
handling, which leads to tests having code like this, which then gets copy+pasted
all over the place and becomes a nightmare to maintain.

static void __vcpu_run_expect(struct kvm_vcpu *vcpu, unsigned int cmd)
{
	struct ucall uc;

	vcpu_run(vcpu);
	switch (get_ucall(vcpu, &uc)) {
	case UCALL_ABORT:
		REPORT_GUEST_ASSERT(uc);
		break;
	default:
		if (uc.cmd == cmd)
			return;

		TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
	}
}

> I can see people forgetting to add TEST_ASSERT_EQ()s to check results of
> setup/teardown functions but I think those errors would surface some
> other way anyway.

Heh, I don't mean to be condescending, but I highly doubt you'll have this
opinion after you've had to debug a completely unfamiliar test that's failing
in weird ways, for the tenth time.

> Not a strongly-held opinion,

As you may have noticed, I have extremely strong opinions in this area :-)

> and no major concerns on the naming either. It's a selftest after all and
> IIUC we're okay to have selftest interfaces change anyway?

Yes, changes are fine.  It's the churn I want to avoid.

Oh, and here's the "answer" to the TEST_ASSERT_EQ() failure:

  ==== Test Assertion Failure ====
  include/kvm_util.h:794: !ret
  pid=43866 tid=43866 errno=22 - Invalid argument
     1	0x0000000000415486: vcpu_sregs_set at kvm_util.h:794 (discriminator 4)
     2	 (inlined by) vcpu_load_state at processor.c:1125 (discriminator 4)
     3	0x0000000000402805: save_restore_vm at hyperv_evmcs.c:221
     4	0x000000000040204d: main at hyperv_evmcs.c:286
     5	0x000000000041dfc3: __libc_start_call_main at libc-start.o:?
     6	0x000000000042016c: __libc_start_main_impl at ??:?
     7	0x0000000000402220: _start at ??:?
  KVM_SET_SREGS failed, rc: -1 errno: 22 (Invalid argument)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails
  2025-09-30  7:53       ` Ackerley Tng
@ 2025-09-30 14:58         ` Sean Christopherson
  2025-10-01 10:26           ` Ackerley Tng
  0 siblings, 1 reply; 55+ messages in thread
From: Sean Christopherson @ 2025-09-30 14:58 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

On Tue, Sep 30, 2025, Ackerley Tng wrote:
> Sean Christopherson <seanjc@google.com> writes:
> > How's this look?
> >
> > static void test_fault_sigbus(int fd, size_t accessible_size, size_t mmap_size)
> > {
> > 	struct sigaction sa_old, sa_new = {
> > 		.sa_handler = fault_sigbus_handler,
> > 	};
> > 	const uint8_t val = 0xaa;
> > 	uint8_t *mem;
> > 	size_t i;
> >
> > 	mem = kvm_mmap(mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
> >
> > 	sigaction(SIGBUS, &sa_new, &sa_old);
> > 	if (sigsetjmp(jmpbuf, 1) == 0) {
> > 		memset(mem, val, mmap_size);
> > 		TEST_FAIL("memset() should have triggered SIGBUS");
> > 	}
> > 	if (sigsetjmp(jmpbuf, 1) == 0) {
> > 		(void)READ_ONCE(mem[accessible_size]);
> > 		TEST_FAIL("load at first unaccessible byte should have triggered SIGBUS");
> > 	}
> > 	sigaction(SIGBUS, &sa_old, NULL);
> >
> > 	for (i = 0; i < accessible_size; i++)
> > 		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
> >
> > 	kvm_munmap(mem, mmap_size);
> > }
> >
> > static void test_fault_overflow(int fd, size_t total_size)
> > {
> > 	test_fault_sigbus(fd, total_size, total_size * 4);
> > }
> >
> 
> Is it intentional that the same SIGBUS on offset mem + total_size is
> triggered twice? The memset would have worked fine until offset mem +
> total_size, which is the same SIGBUS case as mem[accessible_size]. Or
> was it meant to test that both read and write trigger SIGBUS?

The latter (test both read and write).  I plan on adding this in a separate
commit, i.e. it should be obvious in the actual patches.

> > static void test_fault_private(int fd, size_t total_size)
> > {
> > 	test_fault_sigbus(fd, 0, total_size);
> > }
> >
> 
> I would prefer more unrolling to avoid mental hoops within test code,
> perhaps like (not compile tested):
> 
> static void assert_host_fault_sigbus(uint8_t *mem) 
> {
>  	struct sigaction sa_old, sa_new = {
>  		.sa_handler = fault_sigbus_handler,
>  	};
> 
>  	sigaction(SIGBUS, &sa_new, &sa_old);
>  	if (sigsetjmp(jmpbuf, 1) == 0) {
>  		(void)READ_ONCE(*mem);
>  		TEST_FAIL("Reading %p should have triggered SIGBUS", mem);
>  	}
>         sigaction(SIGBUS, &sa_old, NULL);
> }
> 
> static void test_fault_overflow(int fd, size_t total_size)
> {
> 	uint8_t *mem = kvm_mmap(total_size * 2, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
>         int i;
> 
>  	for (i = 0; i < total_size; i++)
>  		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
> 
>         assert_host_fault_sigbus(mem + total_size);
> 
>         kvm_munmap(mem, mmap_size);
> }
> 
> static void test_fault_private(int fd, size_t total_size)
> {
> 	uint8_t *mem = kvm_mmap(total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
>         int i;
> 
>         assert_host_fault_sigbus(mem);
> 
>         kvm_munmap(mem, mmap_size);
> }

Why?  That loses coverage for read to private memory getting SIBUGS.  I genuinely
don't understand the desire to copy+paste uninteresting code.

> assert_host_fault_sigbus() can then be flexibly reused for conversion

assert_host_fault_sigbus() is a misleading name in the sense that it suggests
that the _only_ thing the helper does is assert that a SIGBUS occurred.  It's
not at all obvious that there's a write to "mem" in there.

> tests (coming up) at various offsets from the mmap()-ed addresses.
> 
> At some point, sigaction, sigsetjmp, etc could perhaps even be further
> wrapped. For testing memory_failure() for guest_memfd we will want to
> check for SIGBUS on memory failure injection instead of on host fault.
> 
> Would be nice if it looked like this (maybe not in this patch series):
> 
> + TEST_ASSERT_WILL_SIGBUS(READ_ONCE(mem[i]))
> + TEST_ASSERT_WILL_SIGBUS(WRITE_ONCE(mem[i]))
> + TEST_ASSERT_WILL_SIGBUS(madvise(MADV_HWPOISON))

Ooh, me likey.  Definitely can do it now.  Using a macro means we can print out
the actual action that didn't generate a SIGUBS, e.g. hacking the test to read
byte 0 generates:

	'(void)READ_ONCE(mem[0])' should have triggered SIGBUS

Hmm, how about TEST_EXPECT_SIGBUS?  TEST_ASSERT_xxx() typically asserts on a
value, i.e. on the result of a previous action.  And s/WILL/EXPECT to make it
clear that the action is expected to SIGBUS _now_.

And if we use a descriptive global variable, we can extract the macro to e.g.
test_util.h or kvm_util.h (not sure we want to do that right away; probably best
left to the future).

static sigjmp_buf expect_sigbus_jmpbuf;
void fault_sigbus_handler(int signum)
{
	siglongjmp(expect_sigbus_jmpbuf, 1);
}

#define TEST_EXPECT_SIGBUS(action)						\
do {										\
	struct sigaction sa_old, sa_new = {					\
		.sa_handler = fault_sigbus_handler,				\
	};									\
										\
	sigaction(SIGBUS, &sa_new, &sa_old);					\
	if (sigsetjmp(expect_sigbus_jmpbuf, 1) == 0) {				\
		action;								\
		TEST_FAIL("'%s' should have triggered SIGBUS", #action);	\
	}									\
	sigaction(SIGBUS, &sa_old, NULL);					\
} while (0)

static void test_fault_sigbus(int fd, size_t accessible_size, size_t map_size)
{
	const char val = 0xaa;
	char *mem;
	size_t i;

	mem = kvm_mmap(map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);

	TEST_EXPECT_SIGBUS(memset(mem, val, map_size));
	TEST_EXPECT_SIGBUS((void)READ_ONCE(mem[accessible_size]));

	for (i = 0; i < accessible_size; i++)
		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);

	kvm_munmap(mem, map_size);
}

> >> If split up as described above, this could be
> >> 
> >> 	if (flags & GUEST_MEMFD_FLAG_MMAP &&
> >> 	    flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED) {
> >> 		gmem_test(mmap_supported_fault_supported, vm, flags);
> >> 		gmem_test(fault_overflow, vm, flags);
> >> 	} else if (flags & GUEST_MEMFD_FLAG_MMAP) {
> >> 		gmem_test(mmap_supported_fault_sigbus, vm, flags);
> >
> > I find these unintuitive, e.g. is this one "mmap() supported, test fault sigbus",
> > or is it "mmap(), test supported fault sigbus".  I also don't like that some of
> > the test names describe the _result_ (SIBGUS), where as others describe _what_
> > is being tested.
> >
> 
> I think of the result (SIGBUS) as part of what's being tested. So
> test_supported_fault_sigbus() is testing that mmap is supported, and
> faulting will result in a SIGBUS.

For an utility helper, e.g. test_fault_sigbus(), or test_write_sigbus(), that's
a-ok.  But it doesn't work for the top-level test functions because trying to
follow that pattern effectively prevents bundling multiple individual testcases,
e.g. test_fallocate() becomes what?  And test_invalid_punch_hole_einval() is
quite obnoxious.

> > In general, I don't like test names that describe the result, because IMO what
> > is being tested is far more interesting.  E.g. from a test coverage persective,
> > I don't care if attempting to fault in (CoCO) private memory gets SIGBUS versus
> > SIGSEGV, but I most definitely care that we have test coverage for the "what".
> >
> 
> The SIGBUS is part of the contract with userspace and that's also part
> of what's being tested IMO.

I don't disagree, but IMO bleeding those details into the top-level functions
isn't necessary.  Random developer that comes along isn't going to care whether
KVM is supposed to SIGBUS or SIGSEGV unless there is a failure.  And as above,
doing so either singles out sigbus or necessitates truly funky names.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success
  2025-09-30 14:24         ` Sean Christopherson
@ 2025-10-01 10:18           ` Ackerley Tng
  0 siblings, 0 replies; 55+ messages in thread
From: Ackerley Tng @ 2025-10-01 10:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Sep 30, 2025, Ackerley Tng wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> > To be perfectly honest, I forgot test_util.h existed :-)
>>
>> Merging/dropping one of kvm_util.h vs test_util.h is a good idea. The
>> distinction is not clear and it's already kind of messy between the two.
>
> That's a topic for another day.
>
>> It's a common pattern in KVM selftests to have a syscall/ioctl wrapper
>> foo() that asserts defaults and a __foo() that doesn't assert anything
>> and allows tests to assert something else, but I have a contrary
>> opinion.
>>
>> I think it's better that tests be explicit about what they're testing
>> for, so perhaps it's better to use macros like TEST_ASSERT_EQ() to
>> explicitly call a function and check the results.
>
> No, foo() and __foo() is a well-established pattern in the kernel, and in KVM
> selftests it is a very well-established pattern for syscalls and ioctls.  And
> I feel very, very strong about handling errors in the core infrastructure.
>
> Relying on developers to remember to add an assert is 100% guaranteed to result
> in missed asserts.  That makes everyone's life painful, because inevitably an
> ioctl will fail on someone else's system, and then they're stuck debugging a
> super random failure with no insight into what the developer _meant_ to do.
>
> And requiring developers to write (i.e. copy+paste) boring, uninteresting code
> to handle failures adds a lot of friction to development, is a terrible use of
> developers' time, and results in _awful_ error messages.  Bad or missing error
> messages in tests have easily wasted tens of hours of just _my_ time; I suspect
> the total cost throughout the KVM community can be measured in tens of days.
>
> E.g. pop quiz, what state did I clobber that generated this error message with
> a TEST_ASSERT_EQ(ret, 0)?  Answer at the bottom.
>
>   ==== Test Assertion Failure ====
>   lib/x86/processor.c:1128: ret == 0
>   pid=2456 tid=2456 errno=22 - Invalid argument
>      1	0x0000000000415465: vcpu_load_state at processor.c:1128
>      2	0x0000000000402805: save_restore_vm at hyperv_evmcs.c:221
>      3	0x000000000040204d: main at hyperv_evmcs.c:286
>      4	0x000000000041df43: __libc_start_call_main at libc-start.o:?
>      5	0x00000000004200ec: __libc_start_main_impl at ??:?
>      6	0x0000000000402220: _start at ??:?
>   0xffffffffffffffff != 0 (ret != 0)
>
> You might say "oh, I can go look at the source".  But what if you don't have the
> source because you got a test failure from CI?  Or because the assert came from
> a bug report due to a failure in someone else's CI pipeline?
>
> That is not a contrived example.  Before the ioctl assertion framework was added,
> KVM selftests was littered with such garbage.  Note, I'm not blaming developers
> in any way.  After having to add tens of asserts on KVM ioctls just to write a
> simple test, it's entirely natural to become fatigued and start throwing in
> TEST_ASSERT_EQ(ret, 0) or TEST_ASSERT(!ret, "ioctl failed").
>
> There's also the mechanics of requiring the caller to assert.  KVM ioctls that
> return a single value, e.g. register accessors, then need to use an out-param to
> communicate the value or error code, e.g. this
>
> 	val = vcpu_get_reg(vcpu, reg_id);
> 	TEST_ASSERT_EQ(val, 0);
>
> would become this:
>
> 	ret = vcpu_get_reg(vcpu, reg_id, &val);
> 	TEST_ASSERT_EQ(ret, 0);
> 	TEST_ASSERT_EQ(val, 0);
>
> But of course, the developer would bundle that into:
>
> 	TEST_ASSERT(!ret && !val, "get_reg failed");
>
> And then the user is really sad when the "!val" condition fails, because they
> can't even tell.  Again, this't a contrived example, it literally happend to me
> when dealing with the guest_memfd NUMA testcase, and was what prompted me to
> write this syscall framework.  This also shows the typical error message that a
> developer will write.
>
> This TEST_ASSERT() failed on me due to a misguided cleanup I made:
>
> 	ret = syscall(__NR_get_mempolicy, &get_policy, &get_nodemask,
> 		      maxnode, mem, MPOL_F_ADDR);
> 	TEST_ASSERT(!ret && get_policy == MPOL_DEFAULT && get_nodemask == 0,
> 		"Policy should be MPOL_DEFAULT and nodes zero");
>
> generating this error message:
>
>   ==== Test Assertion Failure ====
>   guest_memfd_test.c:120: !ret && get_policy == MPOL_DEFAULT && get_nodemask == 0
>   pid=52062 tid=52062 errno=22 - Invalid argument
>      1	0x0000000000404113: test_mbind at guest_memfd_test.c:120 (discriminator 6)
>      2	 (inlined by) __test_guest_memfd at guest_memfd_test.c:409 (discriminator 6)
>      3	0x0000000000402320: test_guest_memfd at guest_memfd_test.c:432
>      4	 (inlined by) main at guest_memfd_test.c:529
>      5	0x000000000041eda3: __libc_start_call_main at libc-start.o:?
>      6	0x0000000000420f4c: __libc_start_main_impl at ??:?
>      7	0x00000000004025c0: _start at ??:?
>   Policy should be MPOL_DEFAULT and nodes zero
>
> At first glance, it would appear that get_mempolicy() failed with -EINVAL.  Nope.
> ret==0, but errno was left set from an earlier syscall.  It took me a few minutes
> of digging and a run with strace to figure out that get_mempolicy() succeeded.
>
> Constrast that with:
>
>         kvm_get_mempolicy(&policy, &nodemask, maxnode, mem, MPOL_F_ADDR);
>         TEST_ASSERT(policy == MPOL_DEFAULT && !nodemask,
>                     "Wanted MPOL_DEFAULT (%u) and nodemask 0x0, got %u and 0x%lx",
>                     MPOL_DEFAULT, policy, nodemask);
>
>   ==== Test Assertion Failure ====
>   guest_memfd_test.c:120: policy == MPOL_DEFAULT && !nodemask
>   pid=52700 tid=52700 errno=22 - Invalid argument
>      1	0x0000000000404915: test_mbind at guest_memfd_test.c:120 (discriminator 6)
>      2	 (inlined by) __test_guest_memfd at guest_memfd_test.c:407 (discriminator 6)
>      3	0x0000000000402320: test_guest_memfd at guest_memfd_test.c:430
>      4	 (inlined by) main at guest_memfd_test.c:527
>      5	0x000000000041eda3: __libc_start_call_main at libc-start.o:?
>      6	0x0000000000420f4c: __libc_start_main_impl at ??:?
>      7	0x00000000004025c0: _start at ??:?
>   Wanted MPOL_DEFAULT (0) and nodemask 0x0, got 1 and 0x1
>
> Yeah, there's still some noise with errno=22, but it's fairly clear that the
> returned values mismatches, and super obvious that the syscall succeeded when
> looking at the code.  This is not a cherry-picked example.  There are hundreds,
> if not thousands, of such asserts in KVM selftests and KVM-Unit-Tests in
> particular.  And that's when developers _aren't_ forced to manually add boilerplate
> asserts in ioctls succeeding.
>
> For people that are completely new to KVM selftests, I can appreciate that it
> might take a while to acclimate to the foo() and __foo() pattern, but I have a
> hard time believing that it adds significant cognitive load after you've spent
> a decent amount of time in KVM selftests.  And I 100% want to cater to the people
> that are dealing with KVM selftests day in, day out.
>

Thanks for taking the time to write this up. I'm going to start a list
of "most useful explanations" and this will go on that list.

>> Or perhaps it should be more explicit, like in the name, that an
>> assertion is made within this function?
>
> No, that's entirely inflexible, will lead to confusion, and adds a copious amount
> of noise.  E.g. this
>
> 	/* emulate hypervisor clearing CR4.OSXSAVE */
> 	vcpu_sregs_get(vcpu, &sregs);
> 	sregs.cr4 &= ~X86_CR4_OSXSAVE;
> 	vcpu_sregs_set(vcpu, &sregs);
>
> versus
>
> 	/* emulate hypervisor clearing CR4.OSXSAVE */
> 	vcpu_sregs_get_assert(vcpu, &sregs);
> 	sregs.cr4 &= ~X86_CR4_OSXSAVE;
> 	vcpu_sregs_set_assert(vcpu, &sregs);
>
> The "assert" is pure noise and makes it harder to see the "get" versus "set".
>
> If we instead annotate the the "no_assert" case, then we'll end up with ambigous
> cases where a developer won't be able to determine if an unannotated API asserts
> or not, and conflict cases where a "no_assert" API _does_ assert, just not on the
> primary ioctl it's invoking.
>
> IMO, foo() and __foo() is quite explicit once you become accustomed to the
> environment.
>
>> In many cases a foo() exists without the corresponding __foo(), which
>> seems to be discouraging testing for error cases.
>
> That's almost always because no one has needed __foo().
>
>> Also, I guess especially for vcpu_run(), tests would like to loop/take
>> different actions based on different errnos and then it gets a bit
>> unwieldy to have to avoid functions that have assertions within them.
>
> vcpu_run() is a special case.  KVM_RUN is so much more than a normal ioctl, and
> so having vcpu_run() follow the "standard" pattern isn't entirely feasible.
>
> Speaking of vcpu_run(), and directly related to idea of having developers manually
> do TEST_ASSERT_EQ(), one of the top items on my selftests todo list is to have
> vcpu_run() handle GUEST_ASSERT and GUEST_PRINTF whenever possible.  Having to add
> UCALL_PRINTF handling just to get a debug message out of a test's guest code is
> beyond frustrating.  Ditto for the 60+ tests that had to manually add UCALL_ABORT
> handling, which leads to tests having code like this, which then gets copy+pasted
> all over the place and becomes a nightmare to maintain.

+1000 this is exactly where I had to avoid assertions!

>
> static void __vcpu_run_expect(struct kvm_vcpu *vcpu, unsigned int cmd)
> {
> 	struct ucall uc;
>
> 	vcpu_run(vcpu);
> 	switch (get_ucall(vcpu, &uc)) {
> 	case UCALL_ABORT:
> 		REPORT_GUEST_ASSERT(uc);
> 		break;
> 	default:
> 		if (uc.cmd == cmd)
> 			return;
>
> 		TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
> 	}
> }
>
>> I can see people forgetting to add TEST_ASSERT_EQ()s to check results of
>> setup/teardown functions but I think those errors would surface some
>> other way anyway.
>
> Heh, I don't mean to be condescending, but I highly doubt you'll have this
> opinion after you've had to debug a completely unfamiliar test that's failing
> in weird ways, for the tenth time.
>
>> Not a strongly-held opinion,
>
> As you may have noticed, I have extremely strong opinions in this area :-)
>
>> and no major concerns on the naming either. It's a selftest after all and
>> IIUC we're okay to have selftest interfaces change anyway?
>
> Yes, changes are fine.  It's the churn I want to avoid.
>
> Oh, and here's the "answer" to the TEST_ASSERT_EQ() failure:
>
>   ==== Test Assertion Failure ====
>   include/kvm_util.h:794: !ret
>   pid=43866 tid=43866 errno=22 - Invalid argument
>      1	0x0000000000415486: vcpu_sregs_set at kvm_util.h:794 (discriminator 4)
>      2	 (inlined by) vcpu_load_state at processor.c:1125 (discriminator 4)
>      3	0x0000000000402805: save_restore_vm at hyperv_evmcs.c:221
>      4	0x000000000040204d: main at hyperv_evmcs.c:286
>      5	0x000000000041dfc3: __libc_start_call_main at libc-start.o:?
>      6	0x000000000042016c: __libc_start_main_impl at ??:?
>      7	0x0000000000402220: _start at ??:?
>   KVM_SET_SREGS failed, rc: -1 errno: 22 (Invalid argument)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails
  2025-09-30 14:58         ` Sean Christopherson
@ 2025-10-01 10:26           ` Ackerley Tng
  0 siblings, 0 replies; 55+ messages in thread
From: Ackerley Tng @ 2025-10-01 10:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, David Hildenbrand,
	Fuad Tabba

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Sep 30, 2025, Ackerley Tng wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> [...snip...]
>> 
>> 
>> At some point, sigaction, sigsetjmp, etc could perhaps even be further
>> wrapped. For testing memory_failure() for guest_memfd we will want to
>> check for SIGBUS on memory failure injection instead of on host fault.
>> 
>> Would be nice if it looked like this (maybe not in this patch series):
>> 
>> + TEST_ASSERT_WILL_SIGBUS(READ_ONCE(mem[i]))
>> + TEST_ASSERT_WILL_SIGBUS(WRITE_ONCE(mem[i]))
>> + TEST_ASSERT_WILL_SIGBUS(madvise(MADV_HWPOISON))
>
> Ooh, me likey.  Definitely can do it now.  Using a macro means we can print out
> the actual action that didn't generate a SIGUBS, e.g. hacking the test to read
> byte 0 generates:
>
> 	'(void)READ_ONCE(mem[0])' should have triggered SIGBUS
>
> Hmm, how about TEST_EXPECT_SIGBUS?  TEST_ASSERT_xxx() typically asserts on a
> value, i.e. on the result of a previous action.  And s/WILL/EXPECT to make it
> clear that the action is expected to SIGBUS _now_.
>
> And if we use a descriptive global variable, we can extract the macro to e.g.
> test_util.h or kvm_util.h (not sure we want to do that right away; probably best
> left to the future).
>
> static sigjmp_buf expect_sigbus_jmpbuf;
> void fault_sigbus_handler(int signum)
> {
> 	siglongjmp(expect_sigbus_jmpbuf, 1);
> }
>
> #define TEST_EXPECT_SIGBUS(action)						\
> do {										\
> 	struct sigaction sa_old, sa_new = {					\
> 		.sa_handler = fault_sigbus_handler,				\
> 	};									\
> 										\
> 	sigaction(SIGBUS, &sa_new, &sa_old);					\
> 	if (sigsetjmp(expect_sigbus_jmpbuf, 1) == 0) {				\
> 		action;								\
> 		TEST_FAIL("'%s' should have triggered SIGBUS", #action);	\
> 	}									\
> 	sigaction(SIGBUS, &sa_old, NULL);					\
> } while (0)
>
> static void test_fault_sigbus(int fd, size_t accessible_size, size_t map_size)
> {
> 	const char val = 0xaa;
> 	char *mem;
> 	size_t i;
>
> 	mem = kvm_mmap(map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
>
> 	TEST_EXPECT_SIGBUS(memset(mem, val, map_size));
> 	TEST_EXPECT_SIGBUS((void)READ_ONCE(mem[accessible_size]));
>
> 	for (i = 0; i < accessible_size; i++)
> 		TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
>
> 	kvm_munmap(mem, map_size);
> }
>

Awesome! Thanks!

And thanks for the explanations on the other suggestions.

>> 
>> [...snip...]
>> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-09-30  0:15               ` Sean Christopherson
  2025-09-30  8:36                 ` Ackerley Tng
@ 2025-10-01 14:22                 ` Vishal Annapurve
  2025-10-01 16:15                   ` Sean Christopherson
  1 sibling, 1 reply; 55+ messages in thread
From: Vishal Annapurve @ 2025-10-01 14:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, Nikita Kalyazin, shivankg

On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
>
>  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
>     that we don't need to add a capability every time a new flag comes along,
>     and so that userspace can gather all flags in a single ioctl.  If gmem ever
>     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
>     that's a non-issue relatively speaking.
>

Guest_memfd capabilities don't necessarily translate into flags, so ideally:
1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
KVM_CAP_GUEST_MEMFD_CAPS.
2) IMO they should both support namespace of 64 values at least from the get go.
3) The reservation scheme for upstream should ideally be LSB's first
for the new caps/flags.

guest_memfd will achieve multiple features in future, both upstream
and in out-of-tree versions to deploy features before they make their
way upstream. Generally the scheme followed by out-of-tree versions is
to define a custom UAPI that won't conflict with upstream UAPIs in
near future. Having a namespace of 32 values gives little space to
avoid the conflict, e.g. features like hugetlb support will have to
eat up at least 5 bits from the flags [1].

[1] https://elixir.bootlin.com/linux/v6.17/source/include/uapi/asm-generic/hugetlb_encode.h#L20

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-10-01 14:22                 ` Vishal Annapurve
@ 2025-10-01 16:15                   ` Sean Christopherson
  2025-10-01 16:31                     ` Vishal Annapurve
  0 siblings, 1 reply; 55+ messages in thread
From: Sean Christopherson @ 2025-10-01 16:15 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, Nikita Kalyazin, shivankg

On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> >
> >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> >     that we don't need to add a capability every time a new flag comes along,
> >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> >     that's a non-issue relatively speaking.
> >
> 
> Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> KVM_CAP_GUEST_MEMFD_CAPS.

I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.

> 2) IMO they should both support namespace of 64 values at least from the get go.

It's a limitation of KVM_CHECK_EXTENSION, and all of KVM's plumbing for ioctls.
Because KVM still supports 32-bit architectures, direct returns from ioctls are
forced to fit in 32-bit values to avoid unintentionally creating different ABI
for 32-bit vs. 64-bit kernels.

We could add KVM_CHECK_EXTENSION2 or KVM_CHECK_EXTENSION64 or something, but I
honestly don't see the point.  The odds of guest_memfd supporting >32 flags is
small, and the odds of that happening in the next ~5 years is basically zero.
All so that userspace can make one syscall instead of two for a path that isn't
remotely performance critical.

So while I agree that being able to enumerate 64 flags from the get-go would be
nice to have, it's simply not worth the effort (unless someone has a clever idea).

> 3) The reservation scheme for upstream should ideally be LSB's first
> for the new caps/flags.

We're getting way ahead of ourselves.  Nothing needs KVM_CAP_GUEST_MEMFD_CAPS at
this time, so there's nothing to discuss.

> guest_memfd will achieve multiple features in future, both upstream
> and in out-of-tree versions to deploy features before they make their

When it comes to upstream uAPI and uABI, out-of-tree kernel code is irrelevant.

> way upstream. Generally the scheme followed by out-of-tree versions is
> to define a custom UAPI that won't conflict with upstream UAPIs in
> near future. Having a namespace of 32 values gives little space to
> avoid the conflict, e.g. features like hugetlb support will have to
> eat up at least 5 bits from the flags [1].

Why on earth would out-of-tree code use KVM_CAP_GUEST_MEMFD_FLAGS?   Providing
infrastructure to support an infinite (quite literally) number of out-of-tree
capabilities and sub-ioctls, with practically zero chance of conflict, is not
difficult.  See internal b/378111418.

But as above, this is not upstream's problem to solve.

> [1] https://elixir.bootlin.com/linux/v6.17/source/include/uapi/asm-generic/hugetlb_encode.h#L20

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-10-01 16:15                   ` Sean Christopherson
@ 2025-10-01 16:31                     ` Vishal Annapurve
  2025-10-01 17:16                       ` Sean Christopherson
  0 siblings, 1 reply; 55+ messages in thread
From: Vishal Annapurve @ 2025-10-01 16:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, Nikita Kalyazin, shivankg

On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> > >
> > >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > >     that we don't need to add a capability every time a new flag comes along,
> > >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> > >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > >     that's a non-issue relatively speaking.
> > >
> >
> > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > KVM_CAP_GUEST_MEMFD_CAPS.
>
> I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.

Ah, ok. Then do you envision the guest_memfd caps to still be separate
KVM caps per guest_memfd feature?

>
> > 2) IMO they should both support namespace of 64 values at least from the get go.
>
> It's a limitation of KVM_CHECK_EXTENSION, and all of KVM's plumbing for ioctls.
> Because KVM still supports 32-bit architectures, direct returns from ioctls are
> forced to fit in 32-bit values to avoid unintentionally creating different ABI
> for 32-bit vs. 64-bit kernels.
>
> We could add KVM_CHECK_EXTENSION2 or KVM_CHECK_EXTENSION64 or something, but I
> honestly don't see the point.  The odds of guest_memfd supporting >32 flags is
> small, and the odds of that happening in the next ~5 years is basically zero.
> All so that userspace can make one syscall instead of two for a path that isn't
> remotely performance critical.
>
> So while I agree that being able to enumerate 64 flags from the get-go would be
> nice to have, it's simply not worth the effort (unless someone has a clever idea).

Ack.

>
> > 3) The reservation scheme for upstream should ideally be LSB's first
> > for the new caps/flags.
>
> We're getting way ahead of ourselves.  Nothing needs KVM_CAP_GUEST_MEMFD_CAPS at
> this time, so there's nothing to discuss.
>
> > guest_memfd will achieve multiple features in future, both upstream
> > and in out-of-tree versions to deploy features before they make their
>
> When it comes to upstream uAPI and uABI, out-of-tree kernel code is irrelevant.
>
> > way upstream. Generally the scheme followed by out-of-tree versions is
> > to define a custom UAPI that won't conflict with upstream UAPIs in
> > near future. Having a namespace of 32 values gives little space to
> > avoid the conflict, e.g. features like hugetlb support will have to
> > eat up at least 5 bits from the flags [1].
>
> Why on earth would out-of-tree code use KVM_CAP_GUEST_MEMFD_FLAGS?   Providing

I can imagine a scenario where KVM_CAP_GUEST_MEMFD_FLAGS is upstreamed
and more flags landing in KVM_CAP_GUEST_MEMFD_FLAGS as supported over
time afterwards. out-of-tree code may ingest KVM_CAP_GUEST_MEMFD_FLAGS
in between.

> infrastructure to support an infinite (quite literally) number of out-of-tree
> capabilities and sub-ioctls, with practically zero chance of conflict, is not
> difficult.  See internal b/378111418.
>
> But as above, this is not upstream's problem to solve.
>
> > [1] https://elixir.bootlin.com/linux/v6.17/source/include/uapi/asm-generic/hugetlb_encode.h#L20

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-10-01 16:31                     ` Vishal Annapurve
@ 2025-10-01 17:16                       ` Sean Christopherson
  2025-10-01 22:13                         ` Vishal Annapurve
  0 siblings, 1 reply; 55+ messages in thread
From: Sean Christopherson @ 2025-10-01 17:16 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, Nikita Kalyazin, shivankg

On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> > > >
> > > >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > >     that we don't need to add a capability every time a new flag comes along,
> > > >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> > > >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > >     that's a non-issue relatively speaking.
> > > >
> > >
> > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > KVM_CAP_GUEST_MEMFD_CAPS.
> >
> > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> 
> Ah, ok. Then do you envision the guest_memfd caps to still be separate
> KVM caps per guest_memfd feature?

Yes?  No?  It depends on the feature and the actual implementation.  E.g.
KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-10-01 17:16                       ` Sean Christopherson
@ 2025-10-01 22:13                         ` Vishal Annapurve
  2025-10-02  0:04                           ` Sean Christopherson
  0 siblings, 1 reply; 55+ messages in thread
From: Vishal Annapurve @ 2025-10-01 22:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, Nikita Kalyazin, shivankg

On Wed, Oct 1, 2025 at 10:16 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > >
> > > > > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> > > > >
> > > > >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > > >     that we don't need to add a capability every time a new flag comes along,
> > > > >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> > > > >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > > >     that's a non-issue relatively speaking.
> > > > >
> > > >
> > > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > > KVM_CAP_GUEST_MEMFD_CAPS.
> > >
> > > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> >
> > Ah, ok. Then do you envision the guest_memfd caps to still be separate
> > KVM caps per guest_memfd feature?
>
> Yes?  No?  It depends on the feature and the actual implementation.  E.g.
> KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.

I think I am confused. Is the proposal here as follows?
* Use KVM_CAP_GUEST_MEMFD_FLAGS for features that map to guest_memfd
creation flags.
* Use KVM caps for guest_memfd features that don't map to any flags.

I think in general it would be better to have a KVM cap for each
feature irrespective of the flags as the feature may also need
additional UAPIs like IOCTLs.

I fail to see the benefits of KVM_CAP_GUEST_MEMFD_FLAGS over
KVM_CAP_GUEST_MEMFD_MMAP:
1) It limits the possible values to 32 even though we could pass 64 flags to
the original ioctl.
2) Userspace has to anyways assume the semantics of each bit position.
3) Userspace still has to check for caps for features that carry extra
UAPI baggage.

KVM_CAP_GUEST_MEMFD_MMAP allows userspace to assume that mmap is
supported and userspace can just pass in the mmap flag that it anyways
has to assume.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-10-01 22:13                         ` Vishal Annapurve
@ 2025-10-02  0:04                           ` Sean Christopherson
  2025-10-02 15:41                             ` Vishal Annapurve
  0 siblings, 1 reply; 55+ messages in thread
From: Sean Christopherson @ 2025-10-02  0:04 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, Nikita Kalyazin, shivankg

On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> On Wed, Oct 1, 2025 at 10:16 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > > >
> > > > > > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > > > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> > > > > >
> > > > > >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > > > >     that we don't need to add a capability every time a new flag comes along,
> > > > > >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> > > > > >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > > > >     that's a non-issue relatively speaking.
> > > > > >
> > > > >
> > > > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > > > KVM_CAP_GUEST_MEMFD_CAPS.
> > > >
> > > > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > > > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > > > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> > >
> > > Ah, ok. Then do you envision the guest_memfd caps to still be separate
> > > KVM caps per guest_memfd feature?
> >
> > Yes?  No?  It depends on the feature and the actual implementation.  E.g.
> > KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.
> 
> I think I am confused. Is the proposal here as follows?
> * Use KVM_CAP_GUEST_MEMFD_FLAGS for features that map to guest_memfd
> creation flags.

No, the proposal is to use KVM_CAP_GUEST_MEMFD_FLAGS to enumerate the set of
supported KVM_CREATE_GUEST_MEMFD flags.  Whether or not there is an associated
"feature" is irrelevant.  I.e. it's a very literal "these are the supported
flags".

> * Use KVM caps for guest_memfd features that don't map to any flags.
> 
> I think in general it would be better to have a KVM cap for each
> feature irrespective of the flags as the feature may also need
                                                   ^^^
> additional UAPIs like IOCTLs.

If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
a CAP is gross overkill.  Even if there are other assets that accompany the new
flag, there's no reason we couldn't say "this feature exist if XYZ flag is
supported".

E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
far beyond being able to create a VM of type KVM_X86_TDX_VM.

KVM_CAP_XEN_HVM is a big collection of flags that have very little in common other
than being for Xen emulation.

> I fail to see the benefits of KVM_CAP_GUEST_MEMFD_FLAGS over
> KVM_CAP_GUEST_MEMFD_MMAP:

Adding a new flag doesn't require all of the things that come along with a new
capability.  E.g. there's zero chance of collisions between maintainer sub-trees
(at least as far as capabilities go; if multiple maintainers are adding multiple
gmem flags in the same kernel release, I really hope they'd be coordinating).

Enumerating in userspace is also more natural, e.g. userspace doesn't have to
manually build the mask of valid flags.

Writing documentation should be much easier (much less boilerplate), e.g. the
sum total of uAPI for adding GUEST_MEMFD_FLAG_INIT_SHARED is:

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7ba92f2ced38..754b662a453c 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6438,6 +6438,11 @@ specified via KVM_CREATE_GUEST_MEMFD.  Currently defined flags:
   ============================ ================================================
   GUEST_MEMFD_FLAG_MMAP        Enable using mmap() on the guest_memfd file
                                descriptor.
+  GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
+                               KVM_CREATE_GUEST_MEMFD (memory files created
+                               without INIT_SHARED will be marked private).
+                               Shared memory can be faulted into host userspace
+                               page tables. Private memory cannot.
   ============================ ================================================
 
 When the KVM MMU performs a PFN lookup to service a guest fault and the backing
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b1d52d0c56ec..52f6000ab020 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
 #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
 
 #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
-#define GUEST_MEMFD_FLAG_MMAP  (1ULL << 0)
+#define GUEST_MEMFD_FLAG_MMAP          (1ULL << 0)
+#define GUEST_MEMFD_FLAG_INIT_SHARED   (1ULL << 1)
 
 struct kvm_create_guest_memfd {
        __u64 size;

> 1) It limits the possible values to 32 even though we could pass 64 flags to
> the original ioctl.

So because we're currently limited to 32 flags, we should instead throw in the
towel and artificially limit ourselves to 1 flag (0 or 1)?  Because for all intents
and purposes, that's what we'd be doing.

Again, that is unlikely to be problematic before I retire.  It might not even be
a problem _ever_, because with luck we'll kill off 32-bit KVM in the next few
years and then we can actually leverage returning a "long" from ioctls.  Literally
every capability that returns a mask of flags has this "problem"; it's not notable
or even an issue in practice.

> 2) Userspace has to anyways assume the semantics of each bit position.

Not always.

	uint64_t valid_flags = vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS);
	uint64_t flag;
	int fd;

	for (flag = BIT(0); flag; flag <<= 1) {
		fd = __vm_create_guest_memfd(vm, page_size, flag);
		if (flag & valid_flags) {
			TEST_ASSERT(fd >= 0,
				    "guest_memfd() with flag '0x%lx' should succeed",
				    flag);
			close(fd);
		} else {
			TEST_ASSERT(fd < 0 && errno == EINVAL,
				    "guest_memfd() with flag '0x%lx' should fail with EINVAL",
				    flag);
		}
	}

But pedantry aside, I don't see how this is at all an interesting point.  Yes,
userspace has to know how to use a feature.

> 3) Userspace still has to check for caps for features that carry extra
> UAPI baggage.

That's simply not true.  E.g. see the example with VM types.
 
> KVM_CAP_GUEST_MEMFD_MMAP allows userspace to assume that mmap is
> supported and userspace can just pass in the mmap flag that it anyways
> has to assume.

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-10-02  0:04                           ` Sean Christopherson
@ 2025-10-02 15:41                             ` Vishal Annapurve
  2025-10-03  0:12                               ` Sean Christopherson
  0 siblings, 1 reply; 55+ messages in thread
From: Vishal Annapurve @ 2025-10-02 15:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, Nikita Kalyazin, shivankg

On Wed, Oct 1, 2025 at 5:04 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > On Wed, Oct 1, 2025 at 10:16 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > >
> > > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > > > >
> > > > > > > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > > > > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> > > > > > >
> > > > > > >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > > > > >     that we don't need to add a capability every time a new flag comes along,
> > > > > > >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> > > > > > >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > > > > >     that's a non-issue relatively speaking.
> > > > > > >
> > > > > >
> > > > > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > > > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > > > > KVM_CAP_GUEST_MEMFD_CAPS.
> > > > >
> > > > > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > > > > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > > > > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> > > >
> > > > Ah, ok. Then do you envision the guest_memfd caps to still be separate
> > > > KVM caps per guest_memfd feature?
> > >
> > > Yes?  No?  It depends on the feature and the actual implementation.  E.g.
> > > KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.
> >
> > I think I am confused. Is the proposal here as follows?
> > * Use KVM_CAP_GUEST_MEMFD_FLAGS for features that map to guest_memfd
> > creation flags.
>
> No, the proposal is to use KVM_CAP_GUEST_MEMFD_FLAGS to enumerate the set of
> supported KVM_CREATE_GUEST_MEMFD flags.  Whether or not there is an associated
> "feature" is irrelevant.  I.e. it's a very literal "these are the supported
> flags".
>
> > * Use KVM caps for guest_memfd features that don't map to any flags.
> >
> > I think in general it would be better to have a KVM cap for each
> > feature irrespective of the flags as the feature may also need
>                                                    ^^^
> > additional UAPIs like IOCTLs.
>
> If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
> a CAP is gross overkill.  Even if there are other assets that accompany the new
> flag, there's no reason we couldn't say "this feature exist if XYZ flag is
> supported".
>
> E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
> KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
> far beyond being able to create a VM of type KVM_X86_TDX_VM.
>

What's your opinion about having KVM_CAP_GUEST_MEMFD_MMAP part of
KVM_CAP_GUEST_MEMFD_CAPS i.e. having a KVM cap covering all features
of guest_memfd? That seems more consistent to me in order for
userspace to deduce the supported features and assume flags/ioctls/...
associated with the feature as a group.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-10-02 15:41                             ` Vishal Annapurve
@ 2025-10-03  0:12                               ` Sean Christopherson
  2025-10-03  4:10                                 ` Vishal Annapurve
  0 siblings, 1 reply; 55+ messages in thread
From: Sean Christopherson @ 2025-10-03  0:12 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-kernel, Nikita Kalyazin, shivankg

On Thu, Oct 02, 2025, Vishal Annapurve wrote:
> On Wed, Oct 1, 2025 at 5:04 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > On Wed, Oct 1, 2025 at 10:16 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > > > > >
> > > > > > > > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > > > > > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> > > > > > > >
> > > > > > > >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > > > > > >     that we don't need to add a capability every time a new flag comes along,
> > > > > > > >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> > > > > > > >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > > > > > >     that's a non-issue relatively speaking.
> > > > > > > >
> > > > > > >
> > > > > > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > > > > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > > > > > KVM_CAP_GUEST_MEMFD_CAPS.
> > > > > >
> > > > > > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > > > > > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > > > > > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> > > > >
> > > > > Ah, ok. Then do you envision the guest_memfd caps to still be separate
> > > > > KVM caps per guest_memfd feature?
> > > >
> > > > Yes?  No?  It depends on the feature and the actual implementation.  E.g.
> > > > KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.
> > >
> > > I think I am confused. Is the proposal here as follows?
> > > * Use KVM_CAP_GUEST_MEMFD_FLAGS for features that map to guest_memfd
> > > creation flags.
> >
> > No, the proposal is to use KVM_CAP_GUEST_MEMFD_FLAGS to enumerate the set of
> > supported KVM_CREATE_GUEST_MEMFD flags.  Whether or not there is an associated
> > "feature" is irrelevant.  I.e. it's a very literal "these are the supported
> > flags".
> >
> > > * Use KVM caps for guest_memfd features that don't map to any flags.
> > >
> > > I think in general it would be better to have a KVM cap for each
> > > feature irrespective of the flags as the feature may also need
> >                                                    ^^^
> > > additional UAPIs like IOCTLs.
> >
> > If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
> > a CAP is gross overkill.  Even if there are other assets that accompany the new
> > flag, there's no reason we couldn't say "this feature exist if XYZ flag is
> > supported".
> >
> > E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
> > KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
> > far beyond being able to create a VM of type KVM_X86_TDX_VM.
> >
> 
> What's your opinion about having KVM_CAP_GUEST_MEMFD_MMAP part of
> KVM_CAP_GUEST_MEMFD_CAPS i.e. having a KVM cap covering all features
> of guest_memfd?

I'd much prefer to have both.  Describing flags for an ioctl via a bitmask that
doesn't *exactly* match the flags is asking for problems.  At best, it will be
confusing.  E.g. we'll probably end up with code like this:

	gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);

	if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
		gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
	if (gmem_caps & KVM_CAP_GUEST_MEMFD_INIT_SHARED)
		gmem_flags |= KVM_CAP_GUEST_MEMFD_INIT_SHARED;

Those types of patterns often lead to typos causing problems (LOL, case in point,
there's a typo above; I'm leaving it to illustrate my point).  That can be largely
solved by userspace via macro shenanigans, but userspace really shouldn't have to
jump through hoops for such a simple thing.

An ever worse outcome is if userspace does something like:

	gmem_flags = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);

Which might actually work initially, e.g. if KVM_CAP_GUEST_MEMFD_MMAP and
GUEST_MEMFD_FLAG_MMAP have the same value.  But eventually userspace will be sad.

Another issue is that, while unlikely, we could run out of KVM_CAP_GUEST_MEMFD_CAPS
bits before we run out of flags.

And if we use memory attributes, we're also guaranteed to have at least one gmem
capability that returns a bitmask separately from a dedicated one-size-fits-all
cap, e.g.

	case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
		if (vm_memory_attributes)
			return 0;

		return kvm_supported_mem_attributes(kvm);

Side topic, looking at this, I don't think we need KVM_CAP_GUEST_MEMFD_CAPS, I'm
pretty sure we can simply extend KVM_CAP_GUEST_MEMFD.  E.g. 

#define KVM_GUEST_MEMFD_FEAT_BASIC		(1ULL << 0)
#define KVM_GUEST_MEMFD_FEAT_FANCY		(1ULL << 1)

	case KVM_CAP_GUEST_MEMFD:
		return KVM_GUEST_MEMFD_FEAT_BASIC |
		       KVM_GUEST_MEMFD_FEAT_FANCY;

> That seems more consistent to me in order for userspace to deduce the
> supported features and assume flags/ioctls/...  associated with the feature
> as a group.

If we add a feature that comes with a flag, we could always add both, i.e. a
feature flag for KVM_CAP_GUEST_MEMFD along with the natural enumeration for
KVM_CAP_GUEST_MEMFD_FLAGS.  That certainly wouldn't be my first choice, but it's
a possibility, e.g. if it really is the most intuitive solution.  But that's
getting quite hypothetical.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-10-03  0:12                               ` Sean Christopherson
@ 2025-10-03  4:10                                 ` Vishal Annapurve
  2025-10-03 16:13                                   ` Sean Christopherson
  0 siblings, 1 reply; 55+ messages in thread
From: Vishal Annapurve @ 2025-10-03  4:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm list, LKML, Nikita Kalyazin, Shivank Garg

On Thu, Oct 2, 2025, 5:12 PM Sean Christopherson <seanjc@google.com> wrote:
>
> > >
> > > If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
> > > a CAP is gross overkill.  Even if there are other assets that accompany the new
> > > flag, there's no reason we couldn't say "this feature exist if XYZ flag is
> > > supported".
> > >
> > > E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
> > > KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
> > > far beyond being able to create a VM of type KVM_X86_TDX_VM.
> > >
> >
> > What's your opinion about having KVM_CAP_GUEST_MEMFD_MMAP part of
> > KVM_CAP_GUEST_MEMFD_CAPS i.e. having a KVM cap covering all features
> > of guest_memfd?
>
> I'd much prefer to have both.  Describing flags for an ioctl via a bitmask that
> doesn't *exactly* match the flags is asking for problems.  At best, it will be
> confusing.  E.g. we'll probably end up with code like this:
>
>         gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);
>
>         if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
>                 gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
>         if (gmem_caps & KVM_CAP_GUEST_MEMFD_INIT_SHARED)
>                 gmem_flags |= KVM_CAP_GUEST_MEMFD_INIT_SHARED;
>

No, I actually meant the userspace can just rely on the cap to assume
right flags to be available (not necessarily the same flags as cap
bits).

i.e. Userspace will do something like:
gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);

if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
        gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
if (gmem_caps & KVM_CAP_GUEST_MEMFD_HUGETLB)
        gmem_flags |= GUEST_MEMFD_FLAG_HUGETLB | GUEST_MEMFD_FLAG_HUGETLB_2MB;

Userspace has to anyways assume flag values, userspace just needs to
know if a particular feature is available.

> ...
> Another issue is that, while unlikely, we could run out of KVM_CAP_GUEST_MEMFD_CAPS
> bits before we run out of flags.

I would say that's unlikely as I know of at least one feature that
needs multiple flag bits.

>
> And if we use memory attributes, we're also guaranteed to have at least one gmem
> capability that returns a bitmask separately from a dedicated one-size-fits-all
> cap, e.g.
>
>         case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
>                 if (vm_memory_attributes)
>                         return 0;
>
>                 return kvm_supported_mem_attributes(kvm);

For this one, we need a separate dedicated cap.

>
> Side topic, looking at this, I don't think we need KVM_CAP_GUEST_MEMFD_CAPS, I'm
> pretty sure we can simply extend KVM_CAP_GUEST_MEMFD.  E.g.
>
> #define KVM_GUEST_MEMFD_FEAT_BASIC              (1ULL << 0)
> #define KVM_GUEST_MEMFD_FEAT_FANCY              (1ULL << 1)
>
>         case KVM_CAP_GUEST_MEMFD:
>                 return KVM_GUEST_MEMFD_FEAT_BASIC |
>                        KVM_GUEST_MEMFD_FEAT_FANCY;

This scheme seems ok to me.

>
> > That seems more consistent to me in order for userspace to deduce the
> > supported features and assume flags/ioctls/...  associated with the feature
> > as a group.
>
> If we add a feature that comes with a flag, we could always add both, i.e. a
> feature flag for KVM_CAP_GUEST_MEMFD along with the natural enumeration for
> KVM_CAP_GUEST_MEMFD_FLAGS.  That certainly wouldn't be my first choice, but it's
> a possibility, e.g. if it really is the most intuitive solution.  But that's
> getting quite hypothetical.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-10-03  4:10                                 ` Vishal Annapurve
@ 2025-10-03 16:13                                   ` Sean Christopherson
  2025-10-03 20:30                                     ` Vishal Annapurve
  0 siblings, 1 reply; 55+ messages in thread
From: Sean Christopherson @ 2025-10-03 16:13 UTC (permalink / raw)
  To: Vishal Annapurve
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm list, LKML, Nikita Kalyazin, Shivank Garg

On Thu, Oct 02, 2025, Vishal Annapurve wrote:
> On Thu, Oct 2, 2025, 5:12 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > > >
> > > > If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
> > > > a CAP is gross overkill.  Even if there are other assets that accompany the new
> > > > flag, there's no reason we couldn't say "this feature exist if XYZ flag is
> > > > supported".
> > > >
> > > > E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
> > > > KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
> > > > far beyond being able to create a VM of type KVM_X86_TDX_VM.
> > > >
> > >
> > > What's your opinion about having KVM_CAP_GUEST_MEMFD_MMAP part of
> > > KVM_CAP_GUEST_MEMFD_CAPS i.e. having a KVM cap covering all features
> > > of guest_memfd?
> >
> > I'd much prefer to have both.  Describing flags for an ioctl via a bitmask that
> > doesn't *exactly* match the flags is asking for problems.  At best, it will be
> > confusing.  E.g. we'll probably end up with code like this:
> >
> >         gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);
> >
> >         if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
> >                 gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
> >         if (gmem_caps & KVM_CAP_GUEST_MEMFD_INIT_SHARED)
> >                 gmem_flags |= KVM_CAP_GUEST_MEMFD_INIT_SHARED;
> >
> 
> No, I actually meant the userspace can just rely on the cap to assume
> right flags to be available (not necessarily the same flags as cap
> bits).
> 
> i.e. Userspace will do something like:
> gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);
> 
> if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
>         gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
> if (gmem_caps & KVM_CAP_GUEST_MEMFD_HUGETLB)
>         gmem_flags |= GUEST_MEMFD_FLAG_HUGETLB | GUEST_MEMFD_FLAG_HUGETLB_2MB;

Yes, that's exactly what I said.  But I goofed when copy+pasted and failed to
do s/KVM_CAP_GUEST_MEMFD_INIT_SHARED/GUEST_MEMFD_FLAG_INIT_SHARED, which is the
type of bug that ideally just can't happen.

Side topic, I'm not at all convinced that this is what we want for KVM's uAPI:

	if (gmem_caps & KVM_CAP_GUEST_MEMFD_HUGETLB)                                  
		gmem_flags |= GUEST_MEMFD_FLAG_HUGETLB | GUEST_MEMFD_FLAG_HUGETLB_2MB;

See https://lore.kernel.org/all/aN_fJEZXo6wkcHOh@google.com.

> Userspace has to anyways assume flag values, userspace just needs to
> know if a particular feature is available.

I don't understand what you mean by "assume flag values".

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
  2025-10-03 16:13                                   ` Sean Christopherson
@ 2025-10-03 20:30                                     ` Vishal Annapurve
  0 siblings, 0 replies; 55+ messages in thread
From: Vishal Annapurve @ 2025-10-03 20:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ackerley Tng, David Hildenbrand, Patrick Roy, Fuad Tabba,
	Paolo Bonzini, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm list, LKML, Nikita Kalyazin, Shivank Garg

On Fri, Oct 3, 2025 at 9:13 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Oct 02, 2025, Vishal Annapurve wrote:
> > On Thu, Oct 2, 2025, 5:12 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > > >
> > > > > If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
> > > > > a CAP is gross overkill.  Even if there are other assets that accompany the new
> > > > > flag, there's no reason we couldn't say "this feature exist if XYZ flag is
> > > > > supported".
> > > > >
> > > > > E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
> > > > > KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
> > > > > far beyond being able to create a VM of type KVM_X86_TDX_VM.
> > > > >
> > > >
> > > > What's your opinion about having KVM_CAP_GUEST_MEMFD_MMAP part of
> > > > KVM_CAP_GUEST_MEMFD_CAPS i.e. having a KVM cap covering all features
> > > > of guest_memfd?
> > >
> > > I'd much prefer to have both.  Describing flags for an ioctl via a bitmask that
> > > doesn't *exactly* match the flags is asking for problems.  At best, it will be
> > > confusing.  E.g. we'll probably end up with code like this:
> > >
> > >         gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);
> > >
> > >         if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
> > >                 gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
> > >         if (gmem_caps & KVM_CAP_GUEST_MEMFD_INIT_SHARED)
> > >                 gmem_flags |= KVM_CAP_GUEST_MEMFD_INIT_SHARED;
> > >
> >
> > No, I actually meant the userspace can just rely on the cap to assume
> > right flags to be available (not necessarily the same flags as cap
> > bits).
> >
> > i.e. Userspace will do something like:
> > gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);
> >
> > if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
> >         gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
> > if (gmem_caps & KVM_CAP_GUEST_MEMFD_HUGETLB)
> >         gmem_flags |= GUEST_MEMFD_FLAG_HUGETLB | GUEST_MEMFD_FLAG_HUGETLB_2MB;
>
> Yes, that's exactly what I said.  But I goofed when copy+pasted and failed to
> do s/KVM_CAP_GUEST_MEMFD_INIT_SHARED/GUEST_MEMFD_FLAG_INIT_SHARED, which is the
> type of bug that ideally just can't happen.
>
> Side topic, I'm not at all convinced that this is what we want for KVM's uAPI:
>
>         if (gmem_caps & KVM_CAP_GUEST_MEMFD_HUGETLB)
>                 gmem_flags |= GUEST_MEMFD_FLAG_HUGETLB | GUEST_MEMFD_FLAG_HUGETLB_2MB;
>
> See https://lore.kernel.org/all/aN_fJEZXo6wkcHOh@google.com.

Ack, that makes sense to me.

>
> > Userspace has to anyways assume flag values, userspace just needs to
> > know if a particular feature is available.
>
> I don't understand what you mean by "assume flag values".

Ok, I think you covered the explanation of why you would prefer to
have KVM_CAP_GUEST_MEMFD_FLAGS around and I misinterpreted some of it.

One more example with KVM_CAP_GUEST_MEMFD_FLAGS around:

gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);
valid_flags = kvm_check_cap(KVM_CAP_GUEST_MEMFD_FLAGS);

if (gmem_caps & KVM_CAP_GUEST_MEMFD_CONVERSION) {
               // Use single memory backing paths for 4K backing
              if (valid_flags & GUEST_MEMFD_FLAG_MMAP)
                          gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
              else
                        // error out;
}
if (gmem_caps & KVM_CAP_GUEST_MEMFD_HUGETLB_CONVERSION) {
               // Use single memory backing paths for hugetlb memory backing
               if (valid_flags & GUEST_MEMFD_FLAG_HUGETLB) {
                          gmem_flags |= GUEST_MEMFD_FLAG_HUGETLB;
                          kvm_create_guest_memfd.huge_page_size_log2 = ...;
               } else
                        // error out;
}

Userspace will have to rely on a combination of flags and caps to
decide it's control flow instead of just caps. Thinking more about
this, I don't have a strong preference between two scenarios i.e. with
or without KVM_CAP_GUEST_MEMFD_FLAGS.

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2025-10-03 20:30 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
2025-09-29  8:38   ` David Hildenbrand
2025-09-29  8:57     ` Fuad Tabba
2025-09-29  9:01       ` David Hildenbrand
2025-09-29  9:04   ` Fuad Tabba
2025-09-29  9:43     ` Ackerley Tng
2025-09-29 10:15       ` Patrick Roy
2025-09-29 10:22         ` David Hildenbrand
2025-09-29 10:51           ` Ackerley Tng
2025-09-29 16:55             ` Sean Christopherson
2025-09-30  0:15               ` Sean Christopherson
2025-09-30  8:36                 ` Ackerley Tng
2025-10-01 14:22                 ` Vishal Annapurve
2025-10-01 16:15                   ` Sean Christopherson
2025-10-01 16:31                     ` Vishal Annapurve
2025-10-01 17:16                       ` Sean Christopherson
2025-10-01 22:13                         ` Vishal Annapurve
2025-10-02  0:04                           ` Sean Christopherson
2025-10-02 15:41                             ` Vishal Annapurve
2025-10-03  0:12                               ` Sean Christopherson
2025-10-03  4:10                                 ` Vishal Annapurve
2025-10-03 16:13                                   ` Sean Christopherson
2025-10-03 20:30                                     ` Vishal Annapurve
2025-09-29 16:54       ` Sean Christopherson
2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
2025-09-29  9:12   ` Fuad Tabba
2025-09-29  9:17   ` David Hildenbrand
2025-09-29 10:56   ` Ackerley Tng
2025-09-29 16:58     ` Sean Christopherson
2025-09-30  6:52       ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
2025-09-29  9:18   ` David Hildenbrand
2025-09-29  9:24   ` Fuad Tabba
2025-09-29 11:02   ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
2025-09-29  9:21   ` David Hildenbrand
2025-09-29  9:24   ` Fuad Tabba
2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
2025-09-29  9:24   ` Fuad Tabba
2025-09-29  9:28   ` David Hildenbrand
2025-09-29 11:08   ` Ackerley Tng
2025-09-29 17:32     ` Sean Christopherson
2025-09-30  7:09       ` Ackerley Tng
2025-09-30 14:24         ` Sean Christopherson
2025-10-01 10:18           ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
2025-09-29  9:24   ` Fuad Tabba
2025-09-29  9:28   ` David Hildenbrand
2025-09-29 14:38   ` Ackerley Tng
2025-09-29 18:10     ` Sean Christopherson
2025-09-29 18:35       ` Sean Christopherson
2025-09-30  7:53       ` Ackerley Tng
2025-09-30 14:58         ` Sean Christopherson
2025-10-01 10:26           ` Ackerley Tng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox