From: Ackerley Tng <ackerleytng@google.com>
To: aik@amd.com, andrew.jones@linux.dev, binbin.wu@linux.intel.com,
brauner@kernel.org, chao.p.peng@linux.intel.com,
david@kernel.org, ira.weiny@intel.com, jmattson@google.com,
jroedel@suse.de, jthoughton@google.com, michael.roth@amd.com,
oupton@kernel.org, pankaj.gupta@amd.com, qperret@google.com,
rick.p.edgecombe@intel.com, rientjes@google.com,
shivankg@amd.com, steven.price@arm.com, tabba@google.com,
willy@infradead.org, wyihan@google.com, yan.y.zhao@intel.com,
forkloop@google.com, pratyush@kernel.org,
suzuki.poulose@arm.com, aneesh.kumar@kernel.org,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Shuah Khan <shuah@kernel.org>,
Vishal Annapurve <vannapurve@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
Vlastimil Babka <vbabka@kernel.org>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
Ackerley Tng <ackerleytng@google.com>
Subject: [PATCH RFC v4 11/44] KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check
Date: Thu, 26 Mar 2026 15:24:20 -0700 [thread overview]
Message-ID: <20260326-gmem-inplace-conversion-v4-11-e202fe950ffd@google.com> (raw)
In-Reply-To: <20260326-gmem-inplace-conversion-v4-0-e202fe950ffd@google.com>
When checking if a guest_memfd folio is safe for conversion, its refcount
is examined. A folio may be present in a per-CPU lru_add fbatch, which
temporarily increases its refcount. This can lead to a false positive,
incorrectly indicating that the folio is in use and preventing the
conversion, even if it is otherwise safe. The conversion process might not
be on the same CPU that holds the folio in its fbatch, making a simple
per-CPU check insufficient.
To address this, drain all CPUs' lru_add fbatches if an unexpectedly high
refcount is encountered during the safety check. This is performed at most
once per conversion request. Draining only if the folio in question may be
lru cached.
guest_memfd folios are unevictable, so they can only reside in the lru_add
fbatch. If the folio's refcount is still unsafe after draining, then the
conversion is truly deemed unsafe.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
mm/swap.c | 2 ++
virt/kvm/guest_memfd.c | 23 +++++++++++++++++------
2 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/mm/swap.c b/mm/swap.c
index bb19ccbece464..4861661c71fab 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -37,6 +37,7 @@
#include <linux/page_idle.h>
#include <linux/local_lock.h>
#include <linux/buffer_head.h>
+#include <linux/kvm_types.h>
#include "internal.h"
@@ -898,6 +899,7 @@ void lru_add_drain_all(void)
lru_add_drain();
}
#endif /* CONFIG_SMP */
+EXPORT_SYMBOL_FOR_KVM(lru_add_drain_all);
atomic_t lru_disable_count = ATOMIC_INIT(0);
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 0cff9a85a4c53..20a09d9bbcd2b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -8,6 +8,7 @@
#include <linux/mempolicy.h>
#include <linux/pseudo_fs.h>
#include <linux/pagemap.h>
+#include <linux/swap.h>
#include "kvm_mm.h"
@@ -571,25 +572,35 @@ static bool kvm_gmem_range_has_attributes(struct maple_tree *mt,
return true;
}
-static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
- size_t nr_pages, pgoff_t *err_index)
+static bool kvm_gmem_is_safe_for_conversion(struct inode *inode,
+ pgoff_t start, size_t nr_pages,
+ pgoff_t *err_index)
{
struct address_space *mapping = inode->i_mapping;
const int filemap_get_folios_refcount = 1;
pgoff_t last = start + nr_pages - 1;
struct folio_batch fbatch;
+ bool lru_drained = false;
bool safe = true;
int i;
folio_batch_init(&fbatch);
while (safe && filemap_get_folios(mapping, &start, last, &fbatch)) {
- for (i = 0; i < folio_batch_count(&fbatch); ++i) {
+ for (i = 0; i < folio_batch_count(&fbatch);) {
struct folio *folio = fbatch.folios[i];
- if (folio_ref_count(folio) !=
- folio_nr_pages(folio) + filemap_get_folios_refcount) {
- safe = false;
+ safe = (folio_ref_count(folio) ==
+ folio_nr_pages(folio) +
+ filemap_get_folios_refcount);
+
+ if (safe) {
+ ++i;
+ } else if (folio_may_be_lru_cached(folio) &&
+ !lru_drained) {
+ lru_add_drain_all();
+ lru_drained = true;
+ } else {
*err_index = folio->index;
break;
}
--
2.53.0.1018.g2bb0e51243-goog
next prev parent reply other threads:[~2026-03-26 22:24 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 22:24 [PATCH RFC v4 00/44] guest_memfd: In-place conversion support Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 01/44] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 02/44] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 03/44] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 04/44] KVM: Stub in ability to disable per-VM memory attribute tracking Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 05/44] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 06/44] KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 07/44] KVM: guest_memfd: Only prepare folios for private pages Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 08/44] KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 09/44] KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-26 22:24 ` Ackerley Tng [this message]
2026-03-26 22:24 ` [PATCH RFC v4 12/44] KVM: guest_memfd: Introduce default handlers for content modes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 13/44] KVM: guest_memfd: Apply content modes while setting memory attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 14/44] KVM: x86: Add support for applying content modes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 15/44] KVM: Add CAP to enumerate supported SET_MEMORY_ATTRIBUTES2 flags Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 16/44] KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 17/44] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 18/44] KVM: selftests: Create gmem fd before "regular" fd when adding memslot Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 19/44] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset} Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 20/44] KVM: selftests: Add support for mmap() on guest_memfd in core library Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 21/44] KVM: selftests: Add selftests global for guest memory attributes capability Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 22/44] KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 23/44] KVM: selftests: Add helpers for calling ioctls on guest_memfd Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 24/44] KVM: selftests: Test using guest_memfd for guest private memory Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 25/44] KVM: selftests: Test basic single-page conversion flow Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 26/44] KVM: selftests: Test conversion flow when INIT_SHARED Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 27/44] KVM: selftests: Test conversion precision in guest_memfd Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 28/44] KVM: selftests: Test conversion before allocation Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 29/44] KVM: selftests: Convert with allocated folios in different layouts Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 30/44] KVM: selftests: Test that truncation does not change shared/private status Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 31/44] KVM: selftests: Test that shared/private status is consistent across processes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 32/44] KVM: selftests: Test conversion with elevated page refcount Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 33/44] KVM: selftests: Test that conversion to private does not support ZERO Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 34/44] KVM: selftests: Support checking that data not equal expected Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 35/44] KVM: selftests: Test that not specifying a conversion flag scrambles memory contents Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 36/44] KVM: selftests: Reset shared memory after hole-punching Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 37/44] KVM: selftests: Provide function to look up guest_memfd details from gpa Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 38/44] KVM: selftests: Provide common function to set memory attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 39/44] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 40/44] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 41/44] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 42/44] KVM: selftests: Add script to exercise private_mem_conversions_test Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 43/44] KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 44/44] KVM: selftests: Update private memory exits test to work with per-gmem attributes Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 0/6] guest_memfd in-place conversion selftests for SNP Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 1/6] KVM: selftests: Initialize guest_memfd with INIT_SHARED Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 2/6] KVM: selftests: Call snp_launch_update_data() providing copy of memory Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 3/6] KVM: selftests: Make guest_code_xsave more friendly Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 4/6] KVM: selftests: Allow specifying CoCo-privateness while mapping a page Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 5/6] KVM: selftests: Test conversions for SNP Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 6/6] KVM: selftests: Test content modes ZERO and PRESERVE " Ackerley Tng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260326-gmem-inplace-conversion-v4-11-e202fe950ffd@google.com \
--to=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.jones@linux.dev \
--cc=aneesh.kumar@kernel.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=chao.p.peng@linux.intel.com \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=forkloop@google.com \
--cc=hpa@zytor.com \
--cc=ira.weiny@intel.com \
--cc=jgg@ziepe.ca \
--cc=jmattson@google.com \
--cc=jroedel@suse.de \
--cc=jthoughton@google.com \
--cc=kasong@tencent.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=michael.roth@amd.com \
--cc=mingo@redhat.com \
--cc=nphamcs@gmail.com \
--cc=oupton@kernel.org \
--cc=pankaj.gupta@amd.com \
--cc=pbonzini@redhat.com \
--cc=pratyush@kernel.org \
--cc=qperret@google.com \
--cc=rick.p.edgecombe@intel.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=shikemeng@huaweicloud.com \
--cc=shivankg@amd.com \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=tglx@kernel.org \
--cc=vannapurve@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=wyihan@google.com \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox