linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] KVM: guest_memfd: use write for population
@ 2025-09-02 11:19 Kalyazin, Nikita
  2025-09-02 11:20 ` [PATCH v5 1/2] KVM: guest_memfd: add generic population via write Kalyazin, Nikita
  2025-09-02 11:20 ` [PATCH v5 2/2] KVM: selftests: update guest_memfd write tests Kalyazin, Nikita
  0 siblings, 2 replies; 3+ messages in thread
From: Kalyazin, Nikita @ 2025-09-02 11:19 UTC (permalink / raw)
  To: pbonzini@redhat.com, shuah@kernel.org
  Cc: kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, michael.day@amd.com,
	david@redhat.com, jthoughton@google.com, Roy, Patrick,
	Thomson, Jack, Manwaring, Derek, Cali, Marco, Kalyazin, Nikita

[ based on kvm/next ]

Implement guest_memfd allocation and population via the write syscall.
This is useful in non-CoCo use cases where the host can access guest
memory.  Even though the same can also be achieved via userspace mapping
and memcpying from userspace, write provides a more performant option
because it does not need to set page tables and it does not cause a page
fault for every page like memcpy would.  Note that memcpy cannot be
accelerated via MADV_POPULATE_WRITE as it is  not supported by
guest_memfd and relies on GUP.

Populating 512MiB of guest_memfd on a x86 machine:
 - via memcpy: 436 ms
 - via write:  202 ms (-54%)

v5:
 - Replace the call to the unexported filemap_remove_folio with
   zeroing the bytes that could not be copied
 - Fix checkpatch findings

v4:
 - https://lore.kernel.org/kvm/20250828153049.3922-1-kalyazin@amazon.com
 - Switch from implementing the write callback to write_iter
 - Remove conditional compilation

v3:
 - https://lore.kernel.org/kvm/20250303130838.28812-1-kalyazin@amazon.com
 - David/Mike D: Only compile support for the write syscall if
   CONFIG_KVM_GMEM_SHARED_MEM (now gone) is enabled.
v2:
 - https://lore.kernel.org/kvm/20241129123929.64790-1-kalyazin@amazon.com
 - Switch from an ioctl to the write syscall to implement population

v1:
 - https://lore.kernel.org/kvm/20241024095429.54052-1-kalyazin@amazon.com

Nikita Kalyazin (2):
  KVM: guest_memfd: add generic population via write
  KVM: selftests: update guest_memfd write tests

 .../testing/selftests/kvm/guest_memfd_test.c  | 86 +++++++++++++++++--
 virt/kvm/guest_memfd.c                        | 62 ++++++++++++-
 2 files changed, 141 insertions(+), 7 deletions(-)


base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
-- 
2.50.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v5 1/2] KVM: guest_memfd: add generic population via write
  2025-09-02 11:19 [PATCH v5 0/2] KVM: guest_memfd: use write for population Kalyazin, Nikita
@ 2025-09-02 11:20 ` Kalyazin, Nikita
  2025-09-02 11:20 ` [PATCH v5 2/2] KVM: selftests: update guest_memfd write tests Kalyazin, Nikita
  1 sibling, 0 replies; 3+ messages in thread
From: Kalyazin, Nikita @ 2025-09-02 11:20 UTC (permalink / raw)
  To: pbonzini@redhat.com, shuah@kernel.org
  Cc: kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, michael.day@amd.com,
	david@redhat.com, jthoughton@google.com, Roy, Patrick,
	Thomson, Jack, Manwaring, Derek, Cali, Marco, Kalyazin, Nikita

From: Nikita Kalyazin <kalyazin@amazon.com>

write syscall populates guest_memfd with user-supplied data in a generic
way, ie no vendor-specific preparation is performed.  This is supposed
to be used in non-CoCo setups where guest memory is not
hardware-encrypted.

The following behaviour is implemented:
 - only page-aligned count and offset are allowed
 - if the memory is already allocated, the call will successfully
   populate it
 - if the memory is not allocated, the call will both allocate and
   populate
 - if the memory is already populated, the call will not repopulate it

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
---
 virt/kvm/guest_memfd.c | 64 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 08a6bc7d25b6..a2e86ec13e4b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -379,7 +379,9 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
 }
 
 static struct file_operations kvm_gmem_fops = {
-	.mmap		= kvm_gmem_mmap,
+	.mmap           = kvm_gmem_mmap,
+	.llseek         = default_llseek,
+	.write_iter     = generic_perform_write,
 	.open		= generic_file_open,
 	.release	= kvm_gmem_release,
 	.fallocate	= kvm_gmem_fallocate,
@@ -390,6 +392,63 @@ void kvm_gmem_init(struct module *module)
 	kvm_gmem_fops.owner = module;
 }
 
+static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
+				     struct address_space *mapping,
+				     loff_t pos, unsigned int len,
+				     struct folio **foliop,
+				     void **fsdata)
+{
+	struct file *file = kiocb->ki_filp;
+	pgoff_t index = pos >> PAGE_SHIFT;
+	struct folio *folio;
+
+	if (!PAGE_ALIGNED(pos) || len != PAGE_SIZE)
+		return -EINVAL;
+
+	if (pos + len > i_size_read(file_inode(file)))
+		return -EINVAL;
+
+	folio = kvm_gmem_get_folio(file_inode(file), index);
+	if (IS_ERR(folio))
+		return -EFAULT;
+
+	if (WARN_ON_ONCE(folio_test_large(folio))) {
+		folio_unlock(folio);
+		folio_put(folio);
+		return -EFAULT;
+	}
+
+	if (folio_test_uptodate(folio)) {
+		folio_unlock(folio);
+		folio_put(folio);
+		return -ENOSPC;
+	}
+
+	*foliop = folio;
+	return 0;
+}
+
+static int kvm_kmem_gmem_write_end(const struct kiocb *kiocb,
+				   struct address_space *mapping,
+				   loff_t pos, unsigned int len,
+				   unsigned int copied,
+				   struct folio *folio, void *fsdata)
+{
+	if (copied) {
+		if (copied < len) {
+			unsigned int from = pos & (PAGE_SIZE - 1);
+
+			folio_zero_range(folio, from + copied, len - copied);
+		}
+		kvm_gmem_mark_prepared(folio);
+	}
+
+	folio_unlock(folio);
+	folio_put(folio);
+
+	return copied;
+}
+
 static int kvm_gmem_migrate_folio(struct address_space *mapping,
 				  struct folio *dst, struct folio *src,
 				  enum migrate_mode mode)
@@ -442,6 +501,8 @@ static void kvm_gmem_free_folio(struct folio *folio)
 
 static const struct address_space_operations kvm_gmem_aops = {
 	.dirty_folio = noop_dirty_folio,
+	.write_begin = kvm_kmem_gmem_write_begin,
+	.write_end = kvm_kmem_gmem_write_end,
 	.migrate_folio	= kvm_gmem_migrate_folio,
 	.error_remove_folio = kvm_gmem_error_folio,
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
@@ -489,6 +550,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	}
 
 	file->f_flags |= O_LARGEFILE;
+	file->f_mode |= FMODE_LSEEK | FMODE_PWRITE;
 
 	inode = file->f_inode;
 	WARN_ON(file->f_mapping != inode->i_mapping);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v5 2/2] KVM: selftests: update guest_memfd write tests
  2025-09-02 11:19 [PATCH v5 0/2] KVM: guest_memfd: use write for population Kalyazin, Nikita
  2025-09-02 11:20 ` [PATCH v5 1/2] KVM: guest_memfd: add generic population via write Kalyazin, Nikita
@ 2025-09-02 11:20 ` Kalyazin, Nikita
  1 sibling, 0 replies; 3+ messages in thread
From: Kalyazin, Nikita @ 2025-09-02 11:20 UTC (permalink / raw)
  To: pbonzini@redhat.com, shuah@kernel.org
  Cc: kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, michael.day@amd.com,
	david@redhat.com, jthoughton@google.com, Roy, Patrick,
	Thomson, Jack, Manwaring, Derek, Cali, Marco, Kalyazin, Nikita

From: Nikita Kalyazin <kalyazin@amazon.com>

This is to reflect that the write syscall is now implemented for
guest_memfd.

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 86 +++++++++++++++++--
 1 file changed, 80 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index b3ca6737f304..1236e31f5041 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -24,18 +24,91 @@
 #include "test_util.h"
 #include "ucall_common.h"
 
-static void test_file_read_write(int fd)
+static void test_file_read(int fd)
 {
 	char buf[64];
 
 	TEST_ASSERT(read(fd, buf, sizeof(buf)) < 0,
 		    "read on a guest_mem fd should fail");
-	TEST_ASSERT(write(fd, buf, sizeof(buf)) < 0,
-		    "write on a guest_mem fd should fail");
 	TEST_ASSERT(pread(fd, buf, sizeof(buf), 0) < 0,
 		    "pread on a guest_mem fd should fail");
-	TEST_ASSERT(pwrite(fd, buf, sizeof(buf), 0) < 0,
-		    "pwrite on a guest_mem fd should fail");
+}
+
+static void test_file_write(int fd, size_t total_size)
+{
+	size_t page_size = getpagesize();
+	void *buf = NULL;
+	int ret;
+
+	ret = posix_memalign(&buf, page_size, total_size);
+	TEST_ASSERT_EQ(ret, 0);
+
+	/* Check arguments correctness checks work as expected */
+
+	ret = pwrite(fd, buf, page_size - 1, 0);
+	TEST_ASSERT(ret == -1, "write unaligned count on a guest_mem fd should fail");
+	TEST_ASSERT_EQ(errno, EINVAL);
+
+	ret = pwrite(fd, buf, page_size, 1);
+	TEST_ASSERT(ret == -1, "write unaligned offset on a guest_mem fd should fail");
+	TEST_ASSERT_EQ(errno, EINVAL);
+
+	ret = pwrite(fd, buf, page_size, total_size);
+	TEST_ASSERT(ret == -1, "writing past the file size on a guest_mem fd should fail");
+	TEST_ASSERT_EQ(errno, EINVAL);
+
+	ret = pwrite(fd, NULL, page_size, 0);
+	TEST_ASSERT(ret == -1, "supplying a NULL buffer when writing a guest_mem fd should fail");
+	TEST_ASSERT_EQ(errno, EFAULT);
+
+	/* Check double population is not allowed */
+
+	ret = pwrite(fd, buf, page_size, 0);
+	TEST_ASSERT(ret == page_size, "page-aligned write on a guest_mem fd should succeed");
+
+	ret = pwrite(fd, buf, page_size, 0);
+	TEST_ASSERT(ret == -1, "write on already populated guest_mem fd should fail");
+	TEST_ASSERT_EQ(errno, ENOSPC);
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, page_size);
+	TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
+
+	/* Check population is allowed again after punching a hole */
+
+	ret = pwrite(fd, buf, page_size, 0);
+	TEST_ASSERT(ret == page_size,
+		"page-aligned write on a punched guest_mem fd should succeed");
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, page_size);
+	TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
+
+	/* Check population of already allocated memory is allowed */
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, page_size);
+	TEST_ASSERT(!ret, "fallocate with aligned offset and size should succeed");
+
+	ret = pwrite(fd, buf, page_size, 0);
+	TEST_ASSERT(ret == page_size, "write on a preallocated guest_mem fd should succeed");
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, page_size);
+	TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
+
+	/* Check population works until an already populated page is encountered */
+
+	ret = pwrite(fd, buf, total_size, 0);
+	TEST_ASSERT(ret == total_size, "page-aligned write on a guest_mem fd should succeed");
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, page_size);
+	TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
+
+	ret = pwrite(fd, buf, total_size, 0);
+	TEST_ASSERT(ret == page_size, "write on a guest_mem fd should not overwrite data");
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, total_size);
+	TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
+
+
+	free(buf);
 }
 
 static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
@@ -281,7 +354,8 @@ static void test_guest_memfd(unsigned long vm_type)
 
 	fd = vm_create_guest_memfd(vm, total_size, flags);
 
-	test_file_read_write(fd);
+	test_file_read(fd);
+	test_file_write(fd, total_size);
 
 	if (flags & GUEST_MEMFD_FLAG_MMAP) {
 		test_mmap_supported(fd, page_size, total_size);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-09-02 11:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-02 11:19 [PATCH v5 0/2] KVM: guest_memfd: use write for population Kalyazin, Nikita
2025-09-02 11:20 ` [PATCH v5 1/2] KVM: guest_memfd: add generic population via write Kalyazin, Nikita
2025-09-02 11:20 ` [PATCH v5 2/2] KVM: selftests: update guest_memfd write tests Kalyazin, Nikita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).