From: Sean Christopherson <seanjc@google.com>
To: Fuad Tabba <tabba@google.com>
Cc: ackerleytng@google.com, aik@amd.com, andrew.jones@linux.dev,
binbin.wu@linux.intel.com, brauner@kernel.org,
chao.p.peng@linux.intel.com, david@kernel.org,
jmattson@google.com, jthoughton@google.com,
michael.roth@amd.com, oupton@kernel.org, pankaj.gupta@amd.com,
qperret@google.com, rick.p.edgecombe@intel.com,
rientjes@google.com, shivankg@amd.com, steven.price@arm.com,
willy@infradead.org, wyihan@google.com, yan.y.zhao@intel.com,
forkloop@google.com, pratyush@kernel.org,
suzuki.poulose@arm.com, aneesh.kumar@kernel.org,
liam@infradead.org, Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@kernel.org>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Shuah Khan <shuah@kernel.org>,
Vishal Annapurve <vannapurve@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Barry Song <baohua@kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Youngjun Park <youngjun.park@lge.com>,
Qi Zheng <qi.zheng@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Kiryl Shutsemau <kas@kernel.org>,
Baoquan He <baoquan.he@linux.dev>, Jason Gunthorpe <jgg@ziepe.ca>,
Vlastimil Babka <vbabka@kernel.org>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
linux-coco@lists.linux.dev
Subject: Re: [PATCH v8 15/46] KVM: guest_memfd: Call arch invalidate hooks on conversion
Date: Mon, 22 Jun 2026 18:15:45 -0700 [thread overview]
Message-ID: <ajneQVLriUshjFIO@google.com> (raw)
In-Reply-To: <CA+EHjTx+3U++dnhGEkwh2SO82xMugAvvJ9ee1O__sxZCKL_X5A@mail.gmail.com>
On Fri, Jun 19, 2026, Fuad Tabba wrote:
> On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
> <devnull+ackerleytng.google.com@kernel.org> wrote:
> >
> > From: Ackerley Tng <ackerleytng@google.com>
> >
> > When memory in guest_memfd is converted from private to shared, the
> > platform-specific state associated with the guest-private pages must be
> > invalidated or cleaned up.
> >
> > Iterate over the folios in the affected range and call the
> > kvm_arch_gmem_invalidate() hook for each PFN range. This allows
> > architectures to perform necessary teardown, such as updating hardware
> > metadata or encryption states, before the pages are transitioned to the
> > shared state.
> >
> > Invoke this helper after indicating to KVM's mmu code that an invalidation
> > is in progress to stop in-flight page faults from succeeding.
> >
> > Reviewed-by: Fuad Tabba <tabba@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>
> Coming back to this after working through the arm64/pKVM side. My
> Reviewed-by here is from the previous round and the patch hasn't
> changed, but I missed an implication for arm64.
>
> kvm_arch_gmem_invalidate() is now called from two paths with the same
> (start, end) signature: folio teardown (kvm_gmem_free_folio) and
> private->shared conversion (here). For SNP/TDX that's fine, conversion is
> destructive anyway. For pKVM the two need opposite content semantics:
> conversion must preserve the page in place (same physical page, the point
> of in-place conversion without encryption), while teardown must scrub it
> before returning it to the host.
>
> The hook gets only a pfn range with no indication of which caller it's
> serving, so arm64 can't give the two paths the behaviour they need. It
> would help to signal intent on the conversion path: a reason/flag, a
> separate hook, or not routing non-destructive conversion through the
> teardown hook.
>
> arm64 isn't here yet, so this isn't urgent, but the hook is gaining a
> second caller now, and it's cheaper to leave room for the distinction
> than to change a generic contract other arches depend on later.
Crud. It may not be urgent for arm64, but it's urgent for other reasons that
I "can't" describe in detail at the moment, and even if that weren't the case, I
think we should clean things up now. More below.
> > virt/kvm/guest_memfd.c | 41 +++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 41 insertions(+)
> >
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 433f79047b9d1..3c94442bc8131 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -607,6 +607,42 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
> > return safe;
> > }
> >
> > +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> > +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end)
Not your fault, but kvm_arch_gmem_invalidate() is badly misnamed. It's not
"invalidating" anything, it's much more of a "free" callback, as SNP uses it to
put physical pages back into a shared state when a maybe-private folio is freed.
As Fuad points out, (ab)using that hook for the private=>shared conversion case
"works", but not broadly. And it makes the bad name worse, because it's called
from code that _is_ doing true invalidations. For pKVM, it may not even need to
do anything invalidation-like.
To avoid a conflict with patches that are going to have priority over this series,
to set the stage for arm64 support, and to avoid avoid bleeding vendor details
into guest_memfd, as if they are core guest_memfd behavior (only SNP needs the
"invalidation" on this specific transition), I think we should add an arch hook
to do conversions straightaway.
Unless there's a clever option I'm missing, it'll mean adding yet another
HAVE_KVM_ARCH_GMEM_XXX flag? Hmm, especially because IIUC, arm64/pKVM doesn't
need a callback for this case, only the free_folio case.
> > +{
> > + struct folio_batch fbatch;
> > + pgoff_t next = start;
> > + int i;
> > +
> > + folio_batch_init(&fbatch);
> > + while (filemap_get_folios(inode->i_mapping, &next, end - 1, &fbatch)) {
> > + for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> > + struct folio *folio = fbatch.folios[i];
> > + pgoff_t start_index, end_index;
> > + kvm_pfn_t start_pfn, end_pfn;
> > +
> > + start_index = max(start, folio->index);
> > + end_index = min(end, folio_next_index(folio));
> > + /*
> > + * end_index is either in folio or points to
> > + * the first page of the next folio. Hence,
> > + * all pages in range [start_index, end_index)
> > + * are contiguous.
> > + */
> > + start_pfn = folio_file_pfn(folio, start_index);
> > + end_pfn = start_pfn + end_index - start_index;
> > +
> > + kvm_arch_gmem_invalidate(start_pfn, end_pfn);
> > + }
> > +
> > + folio_batch_release(&fbatch);
> > + cond_resched();
> > + }
> > +}
> > +#else
> > +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) {}
> > +#endif
> > +
> > static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> > size_t nr_pages, uint64_t attrs,
> > pgoff_t *err_index)
> > @@ -647,7 +683,12 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> > */
> >
> > kvm_gmem_invalidate_start(inode, start, end);
> > +
> > + if (!to_private)
> > + kvm_gmem_invalidate(inode, start, end);
E.g. instead make this something like this?
kvm_gmem_set_pfn_attributes(...)
Hrm, though that wastes folio lookups in the to_private case. So maybe just this,
assuming pKVM doesn't need to take additional action on conversions?
if (!to_private)
kvm_gmem_make_shared(...)
Actually, if we do that, then we don't need a separate arch hook, just a separate
config. It'll still bleed SNP details into guest_memfd, but it'll at least be
done in a way that's more explicitly arch specific (and it's no different than
what we already do for PREPARE...).
E.g. this? There will still be a looming rename conflict, but that's easy enough
to handle.
diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index 9ce5be7843f2..8aead0abd788 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -648,8 +648,8 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
return safe;
}
-#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
-static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end)
+#ifdef CONFIG_KVM_ARCH_GMEM_FREE_ON_SHARED_CONVERSION
+static void kvm_gmem_make_shared(struct inode *inode, pgoff_t start, pgoff_t end)
{
struct folio_batch fbatch;
pgoff_t next = start;
@@ -681,7 +681,7 @@ static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end)
}
}
#else
-static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) {}
+static void kvm_gmem_make_shared(struct inode *inode, pgoff_t start, pgoff_t end) { }
#endif
static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
@@ -729,7 +729,7 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
kvm_gmem_invalidate_start(inode, start, end);
if (!to_private)
- kvm_gmem_invalidate(inode, start, end);
+ kvm_gmem_make_shared(inode, start, end);
mas_store_prealloc(&mas, xa_mk_value(attrs));
next prev parent reply other threads:[~2026-06-23 1:15 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-19 0:31 [PATCH v8 00/46] guest_memfd: In-place conversion support Ackerley Tng via B4 Relay
2026-06-19 0:31 ` [PATCH v8 01/46] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings Ackerley Tng via B4 Relay
2026-06-22 9:08 ` Binbin Wu
2026-06-23 1:37 ` Sean Christopherson
2026-06-23 2:14 ` Binbin Wu
2026-06-19 0:31 ` [PATCH v8 02/46] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng via B4 Relay
2026-06-23 2:48 ` Binbin Wu
2026-06-19 0:31 ` [PATCH v8 03/46] KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86 Ackerley Tng via B4 Relay
2026-06-23 2:48 ` Binbin Wu
2026-06-19 0:31 ` [PATCH v8 04/46] KVM: Decouple kvm_has_arch_private_mem from CONFIG_KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng via B4 Relay
2026-06-19 8:10 ` Fuad Tabba
2026-06-23 2:51 ` Binbin Wu
2026-06-19 0:31 ` [PATCH v8 05/46] KVM: Make CONFIG_KVM_VM_MEMORY_ATTRIBUTES selectable Ackerley Tng via B4 Relay
2026-06-19 8:12 ` Fuad Tabba
2026-06-19 12:51 ` Julian Braha
2026-06-23 0:16 ` Sean Christopherson
2026-06-19 0:31 ` [PATCH v8 06/46] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined Ackerley Tng via B4 Relay
2026-06-23 3:10 ` Binbin Wu
2026-06-19 0:31 ` [PATCH v8 07/46] KVM: Rename memory attribute APIs to prepare for in-place gmem conversion Ackerley Tng via B4 Relay
2026-06-19 8:16 ` Fuad Tabba
2026-06-19 0:31 ` [PATCH v8 08/46] KVM: Provide generic interface for checking memory private/shared status Ackerley Tng via B4 Relay
2026-06-19 8:19 ` Fuad Tabba
2026-06-19 8:21 ` Fuad Tabba
2026-06-19 9:57 ` Suzuki K Poulose
2026-06-19 0:31 ` [PATCH v8 09/46] KVM: guest_memfd: Introduce function to check GFN " Ackerley Tng via B4 Relay
2026-06-19 8:25 ` Fuad Tabba
2026-06-19 0:31 ` [PATCH v8 10/46] KVM: guest_memfd: Wire up core private/shared attribute interfaces Ackerley Tng via B4 Relay
2026-06-19 8:34 ` Fuad Tabba
2026-06-19 0:31 ` [PATCH v8 11/46] KVM: Consolidate private memory and guest_memfd ifdeffery in kvm_host.h Ackerley Tng via B4 Relay
2026-06-19 11:02 ` Fuad Tabba
2026-06-19 0:31 ` [PATCH v8 12/46] KVM: guest_memfd: Only prepare folios for private pages Ackerley Tng via B4 Relay
2026-06-19 0:31 ` [PATCH v8 13/46] KVM: guest_memfd: Add base support for KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng via B4 Relay
2026-06-19 9:25 ` Fuad Tabba
2026-06-23 0:22 ` Sean Christopherson
2026-06-19 0:31 ` [PATCH v8 14/46] KVM: guest_memfd: Ensure pages are not in use before conversion Ackerley Tng via B4 Relay
2026-06-19 0:31 ` [PATCH v8 15/46] KVM: guest_memfd: Call arch invalidate hooks on conversion Ackerley Tng via B4 Relay
2026-06-19 10:09 ` Fuad Tabba
2026-06-23 1:15 ` Sean Christopherson [this message]
2026-06-19 0:31 ` [PATCH v8 16/46] KVM: guest_memfd: Return early if range already has requested attributes Ackerley Tng via B4 Relay
2026-06-19 0:31 ` [PATCH v8 17/46] KVM: guest_memfd: Advertise KVM_SET_MEMORY_ATTRIBUTES2 ioctl Ackerley Tng via B4 Relay
2026-06-19 10:35 ` Fuad Tabba
2026-06-19 0:31 ` [PATCH v8 18/46] KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check Ackerley Tng via B4 Relay
2026-06-19 0:31 ` [PATCH v8 19/46] KVM: guest_memfd: Use actual size for invalidation in kvm_gmem_release() Ackerley Tng via B4 Relay
2026-06-19 10:46 ` Fuad Tabba
2026-06-19 0:31 ` [PATCH v8 20/46] KVM: guest_memfd: Determine invalidation filter from memory attributes Ackerley Tng via B4 Relay
2026-06-19 0:31 ` [PATCH v8 21/46] KVM: guest_memfd: Zero page while getting pfn Ackerley Tng via B4 Relay
2026-06-19 10:51 ` Fuad Tabba
2026-06-19 0:31 ` [PATCH v8 22/46] KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE Ackerley Tng via B4 Relay
2026-06-19 11:01 ` Fuad Tabba
2026-06-19 0:32 ` [PATCH v8 23/46] KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION Ackerley Tng via B4 Relay
2026-06-19 11:09 ` Fuad Tabba
2026-06-22 7:18 ` Yan Zhao
2026-06-23 1:24 ` Sean Christopherson
2026-06-22 6:57 ` Yan Zhao
2026-06-23 1:22 ` Sean Christopherson
2026-06-19 0:32 ` [PATCH v8 24/46] KVM: guest_memfd: Make in-place conversion the default Ackerley Tng via B4 Relay
2026-06-22 4:53 ` Yan Zhao
2026-06-19 0:32 ` [PATCH v8 25/46] KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 26/46] KVM: selftests: Create gmem fd before "regular" fd when adding memslot Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 27/46] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset} Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 28/46] KVM: selftests: Add support for mmap() on guest_memfd in core library Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 29/46] KVM: selftests: Add selftests global for guest memory attributes capability Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 30/46] KVM: selftests: Add helpers for calling ioctls on guest_memfd Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 31/46] KVM: selftests: Test basic single-page conversion flow Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 32/46] KVM: selftests: Test conversion flow when INIT_SHARED Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 33/46] KVM: selftests: Test conversion precision in guest_memfd Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 34/46] KVM: selftests: Test conversion before allocation Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 35/46] KVM: selftests: Convert with allocated folios in different layouts Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 36/46] KVM: selftests: Test that truncation does not change shared/private status Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 37/46] KVM: selftests: Test that shared/private status is consistent across processes Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 38/46] KVM: selftests: Add helpers to pin pages with CONFIG_GUP_TEST Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 39/46] KVM: selftests: Test conversion with elevated page refcount Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 40/46] KVM: selftests: Reset shared memory after hole-punching Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 41/46] KVM: selftests: Provide function to look up guest_memfd details from gpa Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 42/46] KVM: selftests: Provide common function to set memory attributes Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 43/46] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 44/46] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 45/46] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd Ackerley Tng via B4 Relay
2026-06-19 0:32 ` [PATCH v8 46/46] KVM: selftests: Update private memory exits test to work with per-gmem attributes Ackerley Tng via B4 Relay
2026-06-19 12:28 ` [PATCH v8 00/46] guest_memfd: In-place conversion support Garg, Shivank
2026-06-23 2:39 ` Xiaoyao Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajneQVLriUshjFIO@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.jones@linux.dev \
--cc=aneesh.kumar@kernel.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baoquan.he@linux.dev \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=chao.p.peng@linux.intel.com \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=forkloop@google.com \
--cc=hpa@zytor.com \
--cc=jgg@ziepe.ca \
--cc=jmattson@google.com \
--cc=jthoughton@google.com \
--cc=kas@kernel.org \
--cc=kasong@tencent.com \
--cc=kvm@vger.kernel.org \
--cc=liam@infradead.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=michael.roth@amd.com \
--cc=mingo@redhat.com \
--cc=nphamcs@gmail.com \
--cc=oupton@kernel.org \
--cc=pankaj.gupta@amd.com \
--cc=pbonzini@redhat.com \
--cc=pratyush@kernel.org \
--cc=qi.zheng@linux.dev \
--cc=qperret@google.com \
--cc=rick.p.edgecombe@intel.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=shivankg@amd.com \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=tglx@kernel.org \
--cc=vannapurve@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=wyihan@google.com \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=youngjun.park@lge.com \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox