public inbox for linux-trace-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ackerley Tng via B4 Relay <devnull+ackerleytng.google.com@kernel.org>
To: aik@amd.com, andrew.jones@linux.dev, binbin.wu@linux.intel.com,
	 brauner@kernel.org, chao.p.peng@linux.intel.com,
	david@kernel.org,  ira.weiny@intel.com, jmattson@google.com,
	jthoughton@google.com,  michael.roth@amd.com, oupton@kernel.org,
	pankaj.gupta@amd.com,  qperret@google.com,
	rick.p.edgecombe@intel.com, rientjes@google.com,
	 shivankg@amd.com, steven.price@arm.com, tabba@google.com,
	 willy@infradead.org, wyihan@google.com, yan.y.zhao@intel.com,
	 forkloop@google.com, pratyush@kernel.org,
	suzuki.poulose@arm.com,  aneesh.kumar@kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Sean Christopherson <seanjc@google.com>,
	Thomas Gleixner <tglx@kernel.org>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Masami Hiramatsu <mhiramat@kernel.org>,
	 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	 Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	 Shuah Khan <shuah@kernel.org>,
	Vishal Annapurve <vannapurve@google.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Chris Li <chrisl@kernel.org>,  Kairui Song <kasong@tencent.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	 Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	 Barry Song <baohua@kernel.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	 Youngjun Park <youngjun.park@lge.com>,
	Qi Zheng <qi.zheng@linux.dev>,
	 Shakeel Butt <shakeel.butt@linux.dev>,
	Kiryl Shutsemau <kas@kernel.org>,  Jason Gunthorpe <jgg@ziepe.ca>,
	Vlastimil Babka <vbabka@kernel.org>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	 linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
	 linux-coco@lists.linux.dev,
	Ackerley Tng <ackerleytng@google.com>
Subject: [PATCH RFC v5 11/53] KVM: guest_memfd: Ensure pages are not in use before conversion
Date: Tue, 28 Apr 2026 16:25:06 -0700	[thread overview]
Message-ID: <20260428-gmem-inplace-conversion-v5-11-d8608ccfca22@google.com> (raw)
In-Reply-To: <20260428-gmem-inplace-conversion-v5-0-d8608ccfca22@google.com>

From: Ackerley Tng <ackerleytng@google.com>

When converting memory to private in guest_memfd, it is necessary to ensure
that the pages are not currently being accessed by any other part of the
kernel or userspace to avoid any current user writing to guest private
memory.

guest_memfd checks for unexpected refcounts to determine whether a page is
still in use. The only expected refcounts after unmapping the range
requested for conversion are those that are held by guest_memfd itself.

Update the kvm_memory_attributes2 structure to include an error_offset
field. This allows KVM to report the exact offset where a conversion
failed to userspace. If the safety check fails, return -EAGAIN and copy
the error_offset back to userspace so that it can potentially retry the
operation or handle the failure gracefully.

Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
---
 include/uapi/linux/kvm.h |  3 ++-
 virt/kvm/guest_memfd.c   | 65 ++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e6bbf68a83813..0b55258573d3d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1658,7 +1658,8 @@ struct kvm_memory_attributes2 {
 	__u64 size;
 	__u64 attributes;
 	__u64 flags;
-	__u64 reserved[12];
+	__u64 error_offset;
+	__u64 reserved[11];
 };
 
 #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 9a26eca717047..e87a2b72ff802 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -584,9 +584,42 @@ static int kvm_gmem_mas_preallocate(struct ma_state *mas, u64 attributes,
 	return mas_preallocate(mas, xa_mk_value(attributes), GFP_KERNEL);
 }
 
+static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
+					    size_t nr_pages, pgoff_t *err_index)
+{
+	struct address_space *mapping = inode->i_mapping;
+	const int filemap_get_folios_refcount = 1;
+	pgoff_t last = start + nr_pages - 1;
+	struct folio_batch fbatch;
+	bool safe = true;
+	int i;
+
+	folio_batch_init(&fbatch);
+	while (safe && filemap_get_folios(mapping, &start, last, &fbatch)) {
+
+		for (i = 0; i < folio_batch_count(&fbatch); ++i) {
+			struct folio *folio = fbatch.folios[i];
+
+			if (folio_ref_count(folio) !=
+			    folio_nr_pages(folio) + filemap_get_folios_refcount) {
+				safe = false;
+				*err_index = folio->index;
+				break;
+			}
+		}
+
+		folio_batch_release(&fbatch);
+		cond_resched();
+	}
+
+	return safe;
+}
+
 static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
-				     size_t nr_pages, uint64_t attrs)
+				     size_t nr_pages, uint64_t attrs,
+				     pgoff_t *err_index)
 {
+	bool to_private = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
 	struct address_space *mapping = inode->i_mapping;
 	struct gmem_inode *gi = GMEM_I(inode);
 	pgoff_t end = start + nr_pages;
@@ -600,8 +633,21 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
 
 	mas_init(&mas, mt, start);
 	r = kvm_gmem_mas_preallocate(&mas, attrs, start, nr_pages);
-	if (r)
+	if (r) {
+		*err_index = start;
 		goto out;
+	}
+
+	if (to_private) {
+		unmap_mapping_pages(mapping, start, nr_pages, false);
+
+		if (!kvm_gmem_is_safe_for_conversion(inode, start, nr_pages,
+						     err_index)) {
+			mas_destroy(&mas);
+			r = -EAGAIN;
+			goto out;
+		}
+	}
 
 	/*
 	 * From this point on guest_memfd has performed necessary
@@ -621,9 +667,10 @@ static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
 	struct gmem_file *f = file->private_data;
 	struct inode *inode = file_inode(file);
 	struct kvm_memory_attributes2 attrs;
+	pgoff_t err_index;
 	size_t nr_pages;
 	pgoff_t index;
-	int i;
+	int i, r;
 
 	if (copy_from_user(&attrs, argp, sizeof(attrs)))
 		return -EFAULT;
@@ -647,8 +694,16 @@ static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
 
 	nr_pages = attrs.size >> PAGE_SHIFT;
 	index = attrs.offset >> PAGE_SHIFT;
-	return __kvm_gmem_set_attributes(inode, index, nr_pages,
-					 attrs.attributes);
+	r = __kvm_gmem_set_attributes(inode, index, nr_pages, attrs.attributes,
+				      &err_index);
+	if (r) {
+		attrs.error_offset = ((uint64_t)err_index) << PAGE_SHIFT;
+
+		if (copy_to_user(argp, &attrs, sizeof(attrs)))
+			return -EFAULT;
+	}
+
+	return r;
 }
 
 static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl,

-- 
2.54.0.545.g6539524ca2-goog



  parent reply	other threads:[~2026-04-28 23:25 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 23:24 [PATCH RFC v5 00/53] guest_memfd: In-place conversion support Ackerley Tng via B4 Relay
2026-04-28 23:24 ` [PATCH RFC v5 01/53] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings Ackerley Tng via B4 Relay
2026-04-28 23:24 ` [PATCH RFC v5 02/53] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng via B4 Relay
2026-04-28 23:24 ` [PATCH RFC v5 03/53] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined Ackerley Tng via B4 Relay
2026-04-28 23:24 ` [PATCH RFC v5 04/53] KVM: Stub in ability to disable per-VM memory attribute tracking Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 05/53] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 06/53] KVM: x86/mmu: Bug the VM if gmem attributes are queried to determine max mapping level Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 07/53] KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 08/53] KVM: guest_memfd: Only prepare folios for private pages Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 09/53] KVM: Move kvm_supported_mem_attributes() to kvm_host.h Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 10/53] KVM: guest_memfd: Add basic support for KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng via B4 Relay
2026-04-28 23:25 ` Ackerley Tng via B4 Relay [this message]
2026-04-28 23:25 ` [PATCH RFC v5 12/53] KVM: guest_memfd: Call arch invalidate hooks on conversion Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 13/53] KVM: guest_memfd: Return early if range already has requested attributes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 14/53] KVM: guest_memfd: Advertise KVM_SET_MEMORY_ATTRIBUTES2 ioctl Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 15/53] KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 16/53] KVM: guest_memfd: Use actual size for invalidation in kvm_gmem_release() Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 17/53] KVM: guest_memfd: Determine invalidation filter from memory attributes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 18/53] KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86 Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 19/53] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 20/53] KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 21/53] KVM: guest_memfd: Introduce default handlers for content modes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 22/53] KVM: guest_memfd: Apply content modes while setting memory attributes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 23/53] KVM: x86: Support SW_PROTECTED_VM in applying content modes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 24/53] KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE Ackerley Tng via B4 Relay
2026-04-28 23:40   ` Ackerley Tng
2026-04-28 23:25 ` [PATCH RFC v5 25/53] KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 26/53] KVM: x86: Support SNP and TDX applying content modes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 27/53] KVM: x86: Bug CoCo VM on page fault before finalizing Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 28/53] KVM: Add CAP to enumerate supported SET_MEMORY_ATTRIBUTES2 flags Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 29/53] KVM: selftests: Create gmem fd before "regular" fd when adding memslot Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 30/53] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset} Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 31/53] KVM: selftests: Add support for mmap() on guest_memfd in core library Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 32/53] KVM: selftests: Add selftests global for guest memory attributes capability Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 33/53] KVM: selftests: Add helpers for calling ioctls on guest_memfd Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 34/53] KVM: selftests: Test basic single-page conversion flow Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 35/53] KVM: selftests: Test conversion flow when INIT_SHARED Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 36/53] KVM: selftests: Test conversion precision in guest_memfd Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 37/53] KVM: selftests: Test conversion before allocation Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 38/53] KVM: selftests: Convert with allocated folios in different layouts Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 39/53] KVM: selftests: Test that truncation does not change shared/private status Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 40/53] KVM: selftests: Test that shared/private status is consistent across processes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 41/53] KVM: selftests: Test conversion with elevated page refcount Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 42/53] KVM: selftests: Test that conversion to private does not support ZERO Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 43/53] KVM: selftests: Support checking that data not equal expected Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 44/53] KVM: selftests: Test that not specifying a conversion flag scrambles memory contents Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 45/53] KVM: selftests: Reset shared memory after hole-punching Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 46/53] KVM: selftests: Provide function to look up guest_memfd details from gpa Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 47/53] KVM: selftests: Provide common function to set memory attributes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 48/53] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 49/53] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 50/53] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 51/53] KVM: selftests: Add script to exercise private_mem_conversions_test Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 52/53] KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes Ackerley Tng via B4 Relay
2026-04-28 23:25 ` [PATCH RFC v5 53/53] KVM: selftests: Update private memory exits test to work with per-gmem attributes Ackerley Tng via B4 Relay
2026-04-28 23:33 ` [POC PATCH 0/6] guest_memfd in-place conversion selftests for SNP Ackerley Tng
2026-04-28 23:33   ` [POC PATCH 1/6] KVM: selftests: Initialize guest_memfd with INIT_SHARED Ackerley Tng
2026-04-28 23:33   ` [POC PATCH 2/6] KVM: selftests: Use guest_memfd memory contents in-place for SNP launch update Ackerley Tng
2026-04-28 23:33   ` [POC PATCH 3/6] KVM: selftests: Make guest_code_xsave more friendly Ackerley Tng
2026-04-28 23:33   ` [POC PATCH 4/6] KVM: selftests: Allow specifying CoCo-privateness while mapping a page Ackerley Tng
2026-04-28 23:33   ` [POC PATCH 5/6] KVM: selftests: Test conversions for SNP Ackerley Tng
2026-04-28 23:33   ` [POC PATCH 6/6] KVM: selftests: Test content modes ZERO and PRESERVE " Ackerley Tng
2026-04-29 15:06 ` [PATCH RFC v5 00/53] guest_memfd: In-place conversion support Sean Christopherson
2026-04-29 23:51 ` Michael Roth
2026-04-30 23:51   ` Ackerley Tng
2026-05-01 22:21     ` Ackerley Tng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428-gmem-inplace-conversion-v5-11-d8608ccfca22@google.com \
    --to=devnull+ackerleytng.google.com@kernel.org \
    --cc=ackerleytng@google.com \
    --cc=aik@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.jones@linux.dev \
    --cc=aneesh.kumar@kernel.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=forkloop@google.com \
    --cc=hpa@zytor.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jmattson@google.com \
    --cc=jthoughton@google.com \
    --cc=kas@kernel.org \
    --cc=kasong@tencent.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=michael.roth@amd.com \
    --cc=mingo@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=oupton@kernel.org \
    --cc=pankaj.gupta@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=pratyush@kernel.org \
    --cc=qi.zheng@linux.dev \
    --cc=qperret@google.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=seanjc@google.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=shivankg@amd.com \
    --cc=shuah@kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=tglx@kernel.org \
    --cc=vannapurve@google.com \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=wyihan@google.com \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=youngjun.park@lge.com \
    --cc=yuanchu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox