From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
To: ackerleytng@google.com, aik@amd.com, andrew.jones@linux.dev,
binbin.wu@linux.intel.com, brauner@kernel.org,
chao.p.peng@linux.intel.com, david@kernel.org,
ira.weiny@intel.com, jmattson@google.com, jthoughton@google.com,
michael.roth@amd.com, oupton@kernel.org, pankaj.gupta@amd.com,
qperret@google.com, rick.p.edgecombe@intel.com,
rientjes@google.com, shivankg@amd.com, steven.price@arm.com,
tabba@google.com, willy@infradead.org, wyihan@google.com,
yan.y.zhao@intel.com, forkloop@google.com, pratyush@kernel.org,
suzuki.poulose@arm.com, aneesh.kumar@kernel.org,
liam@infradead.org, Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Shuah Khan <shuah@kernel.org>,
Vishal Annapurve <vannapurve@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Youngjun Park <youngjun.park@lge.com>,
Qi Zheng <qi.zheng@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Kiryl Shutsemau <kas@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
linux-coco@lists.linux.dev
Subject: Re: [PATCH v7 10/42] KVM: guest_memfd: Ensure pages are not in use before conversion
Date: Mon, 8 Jun 2026 10:55:28 +0200 [thread overview]
Message-ID: <509f9a66-5ae9-4c05-bef1-ced89fd29bf0@kernel.org> (raw)
In-Reply-To: <20260522-gmem-inplace-conversion-v7-10-2f0fae496530@google.com>
On 5/23/26 02:17, Ackerley Tng via B4 Relay wrote:
> From: Ackerley Tng <ackerleytng@google.com>
>
> When converting memory to private in guest_memfd, it is necessary to ensure
> that the pages are not currently being accessed by any other part of the
> kernel or userspace to avoid any current user writing to guest private
> memory.
>
> guest_memfd checks for unexpected refcounts to determine whether a page is
> still in use. The only expected refcounts after unmapping the range
> requested for conversion are those that are held by guest_memfd itself.
Is it sufficient to only check, and not also freeze the refcount? (i.e.
using folio_ref_freeze()), because without freezing, anything (e.g.
compaction's pfn-based scanner) could do a speculative folio_try_get() and
the checked refcount becomes stale.
Might be ok if we know that no such speculative increment can result in
actually touching the page contents, and the extra refcount and something
inspecting the struct folio won't interfere with anything else. Then it
could be just a comment mentioning why it's safe.
IIRC the compaction's scanning can result in a migration here so it's
probably ok?
> Update the kvm_memory_attributes2 structure to include an error_offset
> field. This allows KVM to report the exact offset where a conversion
> failed to userspace. If the safety check fails, return -EAGAIN and copy
> the error_offset back to userspace so that it can potentially retry the
> operation or handle the failure gracefully.
>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Co-developed-by: Vishal Annapurve <vannapurve@google.com>
> Signed-off-by: Vishal Annapurve <vannapurve@google.com>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
> include/uapi/linux/kvm.h | 3 ++-
> virt/kvm/guest_memfd.c | 68 ++++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 65 insertions(+), 6 deletions(-)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index e6bbf68a83813..0b55258573d3d 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1658,7 +1658,8 @@ struct kvm_memory_attributes2 {
> __u64 size;
> __u64 attributes;
> __u64 flags;
> - __u64 reserved[12];
> + __u64 error_offset;
> + __u64 reserved[11];
> };
>
> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 426917d22a2b6..2767992955752 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -572,9 +572,45 @@ static int kvm_gmem_mas_preallocate(struct ma_state *mas, u64 attributes,
> return mas_preallocate(mas, xa_mk_value(attributes), GFP_KERNEL);
> }
>
> +static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
> + size_t nr_pages, pgoff_t *err_index)
> +{
> + struct address_space *mapping = inode->i_mapping;
> + const int filemap_get_folios_refcount = 1;
> + pgoff_t last = start + nr_pages - 1;
> + struct folio_batch fbatch;
> + bool safe = true;
> + pgoff_t next;
> + int i;
> +
> + folio_batch_init(&fbatch);
> +
> + next = start;
> + while (safe && filemap_get_folios(mapping, &next, last, &fbatch)) {
> +
> + for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> + struct folio *folio = fbatch.folios[i];
> +
> + if (folio_ref_count(folio) !=
> + folio_nr_pages(folio) + filemap_get_folios_refcount) {
> + safe = false;
> + *err_index = max(start, folio->index);
> + break;
> + }
> + }
> +
> + folio_batch_release(&fbatch);
> + cond_resched();
> + }
> +
> + return safe;
> +}
> +
> static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> - size_t nr_pages, uint64_t attrs)
> + size_t nr_pages, uint64_t attrs,
> + pgoff_t *err_index)
> {
> + bool to_private = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> struct address_space *mapping = inode->i_mapping;
> struct gmem_inode *gi = GMEM_I(inode);
> pgoff_t end = start + nr_pages;
> @@ -588,8 +624,21 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
>
> mas_init(&mas, mt, start);
> r = kvm_gmem_mas_preallocate(&mas, attrs, start, nr_pages);
> - if (r)
> + if (r) {
> + *err_index = start;
> goto out;
> + }
> +
> + if (to_private) {
> + unmap_mapping_pages(mapping, start, nr_pages, false);
> +
> + if (!kvm_gmem_is_safe_for_conversion(inode, start, nr_pages,
> + err_index)) {
> + mas_destroy(&mas);
> + r = -EAGAIN;
> + goto out;
> + }
> + }
>
> /*
> * From this point on guest_memfd has performed necessary
> @@ -609,9 +658,10 @@ static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
> struct gmem_file *f = file->private_data;
> struct inode *inode = file_inode(file);
> struct kvm_memory_attributes2 attrs;
> + pgoff_t err_index;
> size_t nr_pages;
> pgoff_t index;
> - int i;
> + int i, r;
>
> if (copy_from_user(&attrs, argp, sizeof(attrs)))
> return -EFAULT;
> @@ -635,8 +685,16 @@ static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
>
> nr_pages = attrs.size >> PAGE_SHIFT;
> index = attrs.offset >> PAGE_SHIFT;
> - return __kvm_gmem_set_attributes(inode, index, nr_pages,
> - attrs.attributes);
> + r = __kvm_gmem_set_attributes(inode, index, nr_pages, attrs.attributes,
> + &err_index);
> + if (r) {
> + attrs.error_offset = ((uint64_t)err_index) << PAGE_SHIFT;
> +
> + if (copy_to_user(argp, &attrs, sizeof(attrs)))
> + return -EFAULT;
> + }
> +
> + return r;
> }
>
> static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl,
>
next prev parent reply other threads:[~2026-06-08 8:55 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-23 0:17 [PATCH v7 00/42] guest_memfd: In-place conversion support Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 01/42] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 02/42] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 03/42] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 04/42] KVM: Stub in ability to disable per-VM memory attribute tracking Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 05/42] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 06/42] KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 07/42] KVM: guest_memfd: Only prepare folios for private pages Ackerley Tng via B4 Relay
2026-06-02 8:55 ` Suzuki K Poulose
2026-06-02 9:10 ` Suzuki K Poulose
2026-06-02 22:41 ` Ackerley Tng
2026-06-03 8:58 ` Suzuki K Poulose
2026-06-03 13:51 ` Michael Roth
2026-06-02 20:46 ` Ackerley Tng
2026-06-03 13:54 ` Michael Roth
2026-05-23 0:17 ` [PATCH v7 08/42] KVM: Move kvm_supported_mem_attributes() to kvm_host.h Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 09/42] KVM: guest_memfd: Add base support for KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng via B4 Relay
2026-06-01 23:14 ` Michael Roth
2026-05-23 0:17 ` [PATCH v7 10/42] KVM: guest_memfd: Ensure pages are not in use before conversion Ackerley Tng via B4 Relay
2026-06-08 8:55 ` Vlastimil Babka (SUSE) [this message]
2026-05-23 0:17 ` [PATCH v7 11/42] KVM: guest_memfd: Call arch invalidate hooks on conversion Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 12/42] KVM: guest_memfd: Return early if range already has requested attributes Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 13/42] KVM: guest_memfd: Advertise KVM_SET_MEMORY_ATTRIBUTES2 ioctl Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 14/42] KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check Ackerley Tng via B4 Relay
2026-06-08 8:45 ` Vlastimil Babka (SUSE)
2026-05-23 0:17 ` [PATCH v7 15/42] KVM: guest_memfd: Use actual size for invalidation in kvm_gmem_release() Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 16/42] KVM: guest_memfd: Determine invalidation filter from memory attributes Ackerley Tng via B4 Relay
2026-05-23 0:17 ` [PATCH v7 17/42] KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86 Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 18/42] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 19/42] KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 20/42] KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE Ackerley Tng via B4 Relay
2026-06-04 15:29 ` Suzuki K Poulose
2026-06-04 19:05 ` Ackerley Tng
2026-06-05 8:54 ` Suzuki K Poulose
2026-06-04 20:11 ` Michael Roth
2026-06-05 9:06 ` Suzuki K Poulose
2026-05-23 0:18 ` [PATCH v7 21/42] KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 22/42] KVM: selftests: Create gmem fd before "regular" fd when adding memslot Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 23/42] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset} Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 24/42] KVM: selftests: Add support for mmap() on guest_memfd in core library Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 25/42] KVM: selftests: Add selftests global for guest memory attributes capability Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 26/42] KVM: selftests: Add helpers for calling ioctls on guest_memfd Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 27/42] KVM: selftests: Test basic single-page conversion flow Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 28/42] KVM: selftests: Test conversion flow when INIT_SHARED Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 29/42] KVM: selftests: Test conversion precision in guest_memfd Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 30/42] KVM: selftests: Test conversion before allocation Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 31/42] KVM: selftests: Convert with allocated folios in different layouts Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 32/42] KVM: selftests: Test that truncation does not change shared/private status Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 33/42] KVM: selftests: Test that shared/private status is consistent across processes Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 34/42] KVM: selftests: Test conversion with elevated page refcount Ackerley Tng via B4 Relay
2026-06-02 21:26 ` Askar Safin
2026-05-23 0:18 ` [PATCH v7 35/42] KVM: selftests: Reset shared memory after hole-punching Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 36/42] KVM: selftests: Provide function to look up guest_memfd details from gpa Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 37/42] KVM: selftests: Provide common function to set memory attributes Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 38/42] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 39/42] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 40/42] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 41/42] KVM: selftests: Add script to exercise private_mem_conversions_test Ackerley Tng via B4 Relay
2026-05-23 0:18 ` [PATCH v7 42/42] KVM: selftests: Update private memory exits test to work with per-gmem attributes Ackerley Tng via B4 Relay
2026-06-03 21:27 ` [PATCH v7 00/42] guest_memfd: In-place conversion support Ackerley Tng
2026-06-04 20:20 ` Sean Christopherson
2026-06-04 21:14 ` Ackerley Tng
2026-06-05 18:27 ` Sean Christopherson
2026-06-05 13:41 ` [POC] KVM: selftests: Verify conversion works with TDX Ackerley Tng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=509f9a66-5ae9-4c05-bef1-ced89fd29bf0@kernel.org \
--to=vbabka@kernel.org \
--cc=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.jones@linux.dev \
--cc=aneesh.kumar@kernel.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=chao.p.peng@linux.intel.com \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=forkloop@google.com \
--cc=hpa@zytor.com \
--cc=ira.weiny@intel.com \
--cc=jgg@ziepe.ca \
--cc=jmattson@google.com \
--cc=jthoughton@google.com \
--cc=kas@kernel.org \
--cc=kasong@tencent.com \
--cc=kvm@vger.kernel.org \
--cc=liam@infradead.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=michael.roth@amd.com \
--cc=mingo@redhat.com \
--cc=nphamcs@gmail.com \
--cc=oupton@kernel.org \
--cc=pankaj.gupta@amd.com \
--cc=pbonzini@redhat.com \
--cc=pratyush@kernel.org \
--cc=qi.zheng@linux.dev \
--cc=qperret@google.com \
--cc=rick.p.edgecombe@intel.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=shivankg@amd.com \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=tglx@kernel.org \
--cc=vannapurve@google.com \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=wyihan@google.com \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=youngjun.park@lge.com \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox