[RFC PATCH v3 0/6] KVM: pfncache: Add guest

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache
@ 2026-03-10  6:36 Takahiro Itazuri
  2026-03-10  6:41 ` [RFC PATCH v3 1/6] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs Takahiro Itazuri
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Takahiro Itazuri @ 2026-03-10  6:36 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Takahiro Itazuri

[ based on v6.18 with [1] ]

This patch series is another follow-up to RFC v1 with minor fixes of RFC
v2.  (This is still labelled RFC since its dependency [1] has not yet
been merged.)  This change was tested for guest_memfd created with
GUEST_MEMFD_FLAG_MMAP and GUEST_MEMFD_FLAG_NO_DIRECT_MAP in the feature
branch of Firecracker [2].

=== Problem Statement ===

gfn_to_pfn_cache (a.k.a. pfncache) does not work with guest_memfd.  As
of today, pfncaches resolve PFNs via hva_to_pfn(), which requires a
userspace mapping and relies on GUP.  This does not work for guest_memfd
in the following two ways:

  * guest_memfd created with GUEST_MEMFD_FLAG_MMAP does not have a
    userspace mapping due to the nature of private memory.

  * guest_memfd created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP uses an
    AS_NO_DIRECT_MAP mapping, which is rejected by GUP.

In addition, pfncaches map RAM pages via kmap(), which typically returns
an address derived from the direct map.  So kmap() cannot be used for
NO_DIRECT_MAP guest_memfd.  pfncaches require fault-free KHVAs since
they can be used from atomic context.  Thus, it cannot fall back to
access via a userspace mapping like KVM does for other accesses to
NO_DIRECT_MAP guest_memfd.

The introduction of guest_memfd support necessitates additional
invalidation paths in addition to the existing MMU notifier path: one
from guest_memfd invalidation and another from memory attribute updates.

=== Core Approach ===

The core part keeps the original approach in RFC v1:

  * Resolve PFNs for guest_memfd-backed GPAs via kvm_gmem_get_pfn()

  * Obtain a fault-free KHVA for NO_DIRECT_MAP pages via vmap()

=== Main Change since RFC v1 ===

  * Hook pfncache invalidation into guest_memfd invalidation (punch hole
    / release / error handling) as well as into memory attribute updates
    (switch between shared and private memories).

===  Design Considerations (Feedback Appreciated) ===

To implement the above change, this series tries to reuse as much of the
existing invalidation and retry infrastructure as possible.  The
following points are potential design trade-offs where feedback is
especially welcome:

  * Generalize and reuse the existing mn_active_invalidate_count
    (renamed to active_invalidate_count).  This allows reusing the
    existing pfncache retry logic as-is and enables invalidating
    pfncaches without holding mmu_lock from guest_memfd invalidation
    context.  As a side effect, active memslots swap is blocked while
    active_invalidate_count > 0.  To avoid this block, it would be
    possible to introduce a dedicated counter like
    gmem_active_invalidate_count in struct kvm instead.

  * Although both guest_memfd invalidation and memory attribute update
    are driven by GFN ranges, pfncache invalidation is performed using
    HVA ranges and reuses the existing function.  This is because
    GPA-based pfncaches translate GPA->UHVA->PFN and therefore have
    memslot/GPA info, whereas HVA-based pfncaches resolve PFN directly
    from UHVA and do not store memslot/GPA info.  Using GFN-based
    invalidation would therefore miss HVA-based pfncaches.  Technically,
    it would be possible to refactor HVA-based pfncaches to search for
    and retain the corresponding memslot/GPA at activation / refresh
    time instead of at invalidation time.

  * pfncaches are not dynamically allocated but are statically allocated
    on a per-VM and per-vCPU basis.  For a normal VM (i.e. non-Xen),
    there is one pfncache per vCPU.  For a Xen VM, there is one per-VM
    pfncache and five per-vCPU pfncaches.  Given the maximum of 1024
    vCPUs, a normal VM can have up to 1024 pfncaches, consuming 4 MB of
    virtual address space.  A Xen VM can have up to 5121 pfncaches,
    consuming approximately 20 MB of virtual address space.  Although
    the vmalloc area is limited on 32-bit systems, it should be large
    enough and typically tens of TB on 64-bit systems (e.g. 32 TB for
    4-level paging and 12800 TB for 5-level paging on x86_64).  If
    virtual address space exhaustion became a concern, migration to
    mm-local region (forthcoming mermap?) could be considered in the
    future.  Note that vmap() only creates virtual mappings to existing
    pages; they do not allocate new physical pages.

  * With this patch series, HVA-based pfncaches always resolve PFNs
    via hva_to_pfn(), and thus activation for NO_DIRECT_MAP guest_memfd
    fails.  It is technically possible to support this scenario, but it
    would require searching the corresponding memslot and GPA from the
    given UHVA in order to determine whether it is backed by
    guest_memfd.  Doing so would add overhead to the HVA-based pfncache
    activation / refresh paths, to a greater or lesser extent,
    regardless of guest_memfd-backed or not.  At the time of writing,
    only Xen uses HVA-based pfncaches.

=== Changelog ===

Changes since RFC v2:
- Drop avoidance of silent kvm-clock activation failure.
- Fix a compile error for kvm_for_each_memslot().

Changes since RFC v1:
- Prevent kvm-clock activation from failing silently.
- Generalize serialization mechanism for invalidation.
- Hook pfncache invalidation into guest_memfd invalidation and memory
  attribute updates.

RFC v2: https://lore.kernel.org/all/20260226135309.29493-1-itazur@amazon.com/
RFC v1: https://lore.kernel.org/all/20251203144159.6131-1-itazur@amazon.com/

[1]: https://lore.kernel.org/all/20260126164445.11867-1-kalyazin@amazon.com/
[2]: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding

Takahiro Itazuri (6):
  KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs
  KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP
  KVM: Rename invalidate_begin to invalidate_start for consistency
  KVM: pfncache: Rename invalidate_start() helper
  KVM: Rename mn_* invalidate-related fields to generic ones
  KVM: pfncache: Invalidate on gmem invalidation and memattr updates

 Documentation/virt/kvm/locking.rst |   8 +-
 arch/x86/kvm/mmu/mmu.c             |   2 +-
 include/linux/kvm_host.h           |  13 ++--
 include/linux/mmu_notifier.h       |   4 +-
 virt/kvm/guest_memfd.c             |  64 ++++++++++++++--
 virt/kvm/kvm_main.c                | 101 ++++++++++++++++++-------
 virt/kvm/kvm_mm.h                  |  12 +--
 virt/kvm/pfncache.c                | 114 ++++++++++++++++++++---------
 8 files changed, 229 insertions(+), 89 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 1/6] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs
  2026-03-10  6:36 [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
@ 2026-03-10  6:41 ` Takahiro Itazuri
  2026-03-10  6:43 ` [RFC PATCH v3 2/6] KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP Takahiro Itazuri
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Takahiro Itazuri @ 2026-03-10  6:41 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Takahiro Itazuri, Takahiro Itazuri

Currently, pfncaches always resolve PFNs via hva_to_pfn(), which
requires a userspace mapping and relies on GUP.  This does not work for
guest_memfd in the following two ways:

  * guest_memfd created without GUEST_MEMFD_FLAG_MMAP does not have a
    userspace mapping for private memory.

  * guest_memfd created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP uses an
    AS_NO_DIRECT_MAP mapping, which is rejected by GUP.

Resolve PFNs via kvm_gmem_get_pfn() for guest_memfd-backed and GPA-based
pfncaches.  Otherwise, fall back to the existing hva_to_pfn().

Note that HVA-based pfncaches always resolve PFNs via hva_to_pfn(), and
thus activation of HVA-based pfncaches for NO_DIRECT_MAP guest_memfd
fails.  Supporting this scenario would be technically possible, but
would require searching the corresponding memslot and GPA from the given
UHVA in order to determine whether it is backed by guest_memfd.  Doing
so would add overhead to the HVA-based pfncache activation / refresh
paths, to a greater or lesser extent, regardless of guest_memfd-backed
or not.  At the time of writing, only Xen uses HVA-based pfncaches.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 virt/kvm/pfncache.c | 45 +++++++++++++++++++++++++++++++++------------
 1 file changed, 33 insertions(+), 12 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 728d2c1b488a..5d16e2b8a6eb 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -152,7 +152,36 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
 	return kvm->mmu_invalidate_seq != mmu_seq;
 }
 
-static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
+static inline bool gpc_is_gmem_backed(struct gfn_to_pfn_cache *gpc)
+{
+	/* For HVA-based pfncaches, memslot is NULL */
+	return gpc->memslot && kvm_slot_has_gmem(gpc->memslot) &&
+	       (kvm_memslot_is_gmem_only(gpc->memslot) ||
+		kvm_mem_is_private(gpc->kvm, gpa_to_gfn(gpc->gpa)));
+}
+
+static kvm_pfn_t gpc_to_pfn(struct gfn_to_pfn_cache *gpc, struct page **page)
+{
+	if (gpc_is_gmem_backed(gpc)) {
+		kvm_pfn_t pfn;
+
+		if (kvm_gmem_get_pfn(gpc->kvm, gpc->memslot,
+				     gpa_to_gfn(gpc->gpa), &pfn, page, NULL))
+			return KVM_PFN_ERR_FAULT;
+
+		return pfn;
+	}
+
+	return hva_to_pfn(&(struct kvm_follow_pfn) {
+		.slot = gpc->memslot,
+		.gfn = gpa_to_gfn(gpc->gpa),
+		.flags = FOLL_WRITE,
+		.hva = gpc->uhva,
+		.refcounted_page = page,
+	});
+}
+
+static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 {
 	/* Note, the new page offset may be different than the old! */
 	void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
@@ -161,14 +190,6 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 	unsigned long mmu_seq;
 	struct page *page;
 
-	struct kvm_follow_pfn kfp = {
-		.slot = gpc->memslot,
-		.gfn = gpa_to_gfn(gpc->gpa),
-		.flags = FOLL_WRITE,
-		.hva = gpc->uhva,
-		.refcounted_page = &page,
-	};
-
 	lockdep_assert_held(&gpc->refresh_lock);
 
 	lockdep_assert_held_write(&gpc->lock);
@@ -206,7 +227,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 			cond_resched();
 		}
 
-		new_pfn = hva_to_pfn(&kfp);
+		new_pfn = gpc_to_pfn(gpc, &page);
 		if (is_error_noslot_pfn(new_pfn))
 			goto out_error;
 
@@ -319,7 +340,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 		}
 	}
 
-	/* Note: the offset must be correct before calling hva_to_pfn_retry() */
+	/* Note: the offset must be correct before calling gpc_to_pfn_retry() */
 	gpc->uhva += page_offset;
 
 	/*
@@ -327,7 +348,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 	 * drop the lock and do the HVA to PFN lookup again.
 	 */
 	if (!gpc->valid || hva_change) {
-		ret = hva_to_pfn_retry(gpc);
+		ret = gpc_to_pfn_retry(gpc);
 	} else {
 		/*
 		 * If the HVA→PFN mapping was already valid, don't unmap it.
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 2/6] KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP
  2026-03-10  6:36 [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
  2026-03-10  6:41 ` [RFC PATCH v3 1/6] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs Takahiro Itazuri
@ 2026-03-10  6:43 ` Takahiro Itazuri
  2026-03-10  6:43 ` [RFC PATCH v3 3/6] KVM: Rename invalidate_begin to invalidate_start for consistency Takahiro Itazuri
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Takahiro Itazuri @ 2026-03-10  6:43 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Takahiro Itazuri, Takahiro Itazuri

Currently, pfncaches map RAM pages via kmap(), which typically returns a
kernel address derived from the direct map.  However, guest_memfd
created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP has their direct map removed
and uses an AS_NO_DIRECT_MAP mapping.  So kmap() cannot be used in this
case.

pfncaches can be used from atomic context where page faults cannot be
tolerated.  Therefore, it cannot fall back to access via a userspace
mapping like KVM does for other accesses to NO_DIRECT_MAP guest_memfd.

To obtain a fault-free kernel host virtual address (KHVA), use vmap()
for NO_DIRECT_MAP pages.  Since gpc_map() is the sole producer of KHVA
for pfncaches and only vmap() returns a vmalloc address, gpc_unmap()
can reliably pair vunmap() using is_vmalloc_addr().

Although vm_map_ram() could be faster than vmap(), mixing short-lived
and long-lived vm_map_ram() can lead to fragmentation.  For this reason,
vm_map_ram() is recommended only for short-lived ones.  Since pfncaches
typically have a lifetime comparable to that of the VM, vm_map_ram() is
deliberately not used here.

pfncaches are not dynamically allocated but are statically allocated on
a per-VM and per-vCPU basis.  For a normal VM (i.e. non-Xen), there is
one pfncache per vCPU.  For a Xen VM, there is one per-VM pfncache and
five per-vCPU pfncaches.  Given the maximum of 1024 vCPUs, a normal VM
can have up to 1024 pfncaches, consuming 4 MB of virtual address space.
A Xen VM can have up to 5121 pfncaches, consuming approximately 20 MB of
virtual address space.  Although the vmalloc area is limited on 32-bit
systems, it should be large enough and typically tens of TB on 64-bit
systems (e.g. 32 TB for 4-level paging and 12800 TB for 5-level paging
on x86_64).  If virtual address space exhaustion becomes a concern,
migration to an mm-local region (like forthcoming mermap?) could be
considered in the future.  Note that vmap() and vm_map_ram() only create
virtual mappings to existing pages; they do not allocate new physical
pages.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 virt/kvm/pfncache.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 5d16e2b8a6eb..0b49ba98f33f 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -16,6 +16,7 @@
 #include <linux/highmem.h>
 #include <linux/module.h>
 #include <linux/errno.h>
+#include <linux/pagemap.h>

 #include "kvm_mm.h"

@@ -98,8 +99,19 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)

 static void *gpc_map(kvm_pfn_t pfn)
 {
-	if (pfn_valid(pfn))
-		return kmap(pfn_to_page(pfn));
+	if (pfn_valid(pfn)) {
+		struct page *page = pfn_to_page(pfn);
+		struct page *head = compound_head(page);
+		struct address_space *mapping = READ_ONCE(head->mapping);
+
+		if (mapping && mapping_no_direct_map(mapping)) {
+			struct page *pages[] = { page };
+
+			return vmap(pages, 1, VM_MAP, PAGE_KERNEL);
+		}
+
+		return kmap(page);
+	}

 #ifdef CONFIG_HAS_IOMEM
 	return memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
@@ -115,7 +127,15 @@ static void gpc_unmap(kvm_pfn_t pfn, void *khva)
 		return;

 	if (pfn_valid(pfn)) {
-		kunmap(pfn_to_page(pfn));
+		/*
+		 * For valid PFNs, gpc_map() returns either a kmap() address
+		 * (non-vmalloc) or a vmap() address (vmalloc).
+		 */
+		if (is_vmalloc_addr(khva))
+			vunmap(khva);
+		else
+			kunmap(pfn_to_page(pfn));
+
 		return;
 	}

@@ -233,8 +253,11 @@ static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc)

 		/*
 		 * Obtain a new kernel mapping if KVM itself will access the
-		 * pfn.  Note, kmap() and memremap() can both sleep, so this
-		 * too must be done outside of gpc->lock!
+		 * pfn.  Note, kmap(), vmap() and memremap() can all sleep, so
+		 * this too must be done outside of gpc->lock!
+		 * Note that even though gpc->lock is dropped, it's still fine
+		 * to read gpc->pfn and other fields because gpc->refresh_lock
+		 * mutex prevents them from being updated.
 		 */
 		if (new_pfn == gpc->pfn)
 			new_khva = old_khva;
-- 
2.50.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 3/6] KVM: Rename invalidate_begin to invalidate_start for consistency
  2026-03-10  6:36 [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
  2026-03-10  6:41 ` [RFC PATCH v3 1/6] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs Takahiro Itazuri
  2026-03-10  6:43 ` [RFC PATCH v3 2/6] KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP Takahiro Itazuri
@ 2026-03-10  6:43 ` Takahiro Itazuri
  2026-03-11 20:53   ` Sean Christopherson
  2026-03-10  6:43 ` [RFC PATCH v3 4/6] KVM: pfncache: Rename invalidate_start() helper Takahiro Itazuri
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Takahiro Itazuri @ 2026-03-10  6:43 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Takahiro Itazuri, Takahiro Itazuri

Most MMU-related helpers use "_start" suffix.  Align with the prevailing
naming convention for consistency across MMU-related codebase.

```
$ git grep -E "invalidate(_range)?_start" | wc -l
123

$ git grep -E "invalidate(_range)?_begin" | wc -l
14
```

No functional change intended.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 arch/x86/kvm/mmu/mmu.c       |  2 +-
 include/linux/kvm_host.h     |  2 +-
 include/linux/mmu_notifier.h |  4 ++--
 virt/kvm/guest_memfd.c       | 14 +++++++-------
 virt/kvm/kvm_main.c          |  6 +++---
 5 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d3e705ac4c6f..e82a357e2219 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6859,7 +6859,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 
 	write_lock(&kvm->mmu_lock);
 
-	kvm_mmu_invalidate_begin(kvm);
+	kvm_mmu_invalidate_start(kvm);
 
 	kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2ea5d2f172f7..618a71894ed1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1566,7 +1566,7 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
 void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 #endif
 
-void kvm_mmu_invalidate_begin(struct kvm *kvm);
+void kvm_mmu_invalidate_start(struct kvm *kvm);
 void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
 void kvm_mmu_invalidate_end(struct kvm *kvm);
 bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index d1094c2d5fb6..8ecf36a84e3b 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -134,8 +134,8 @@ struct mmu_notifier_ops {
 	 * Invalidation of multiple concurrent ranges may be
 	 * optionally permitted by the driver. Either way the
 	 * establishment of sptes is forbidden in the range passed to
-	 * invalidate_range_begin/end for the whole duration of the
-	 * invalidate_range_begin/end critical section.
+	 * invalidate_range_start/end for the whole duration of the
+	 * invalidate_range_start/end critical section.
 	 *
 	 * invalidate_range_start() is called when all pages in the
 	 * range are still mapped and have at least a refcount of one.
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 5d6e966d4f32..79f34dad0c2f 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -206,7 +206,7 @@ static enum kvm_gfn_range_filter kvm_gmem_get_invalidate_filter(struct inode *in
 	return KVM_FILTER_PRIVATE;
 }
 
-static void __kvm_gmem_invalidate_begin(struct gmem_file *f, pgoff_t start,
+static void __kvm_gmem_invalidate_start(struct gmem_file *f, pgoff_t start,
 					pgoff_t end,
 					enum kvm_gfn_range_filter attr_filter)
 {
@@ -230,7 +230,7 @@ static void __kvm_gmem_invalidate_begin(struct gmem_file *f, pgoff_t start,
 			found_memslot = true;
 
 			KVM_MMU_LOCK(kvm);
-			kvm_mmu_invalidate_begin(kvm);
+			kvm_mmu_invalidate_start(kvm);
 		}
 
 		flush |= kvm_mmu_unmap_gfn_range(kvm, &gfn_range);
@@ -243,7 +243,7 @@ static void __kvm_gmem_invalidate_begin(struct gmem_file *f, pgoff_t start,
 		KVM_MMU_UNLOCK(kvm);
 }
 
-static void kvm_gmem_invalidate_begin(struct inode *inode, pgoff_t start,
+static void kvm_gmem_invalidate_start(struct inode *inode, pgoff_t start,
 				      pgoff_t end)
 {
 	enum kvm_gfn_range_filter attr_filter;
@@ -252,7 +252,7 @@ static void kvm_gmem_invalidate_begin(struct inode *inode, pgoff_t start,
 	attr_filter = kvm_gmem_get_invalidate_filter(inode);
 
 	kvm_gmem_for_each_file(f, inode->i_mapping)
-		__kvm_gmem_invalidate_begin(f, start, end, attr_filter);
+		__kvm_gmem_invalidate_start(f, start, end, attr_filter);
 }
 
 static void __kvm_gmem_invalidate_end(struct gmem_file *f, pgoff_t start,
@@ -287,7 +287,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 	 */
 	filemap_invalidate_lock(inode->i_mapping);
 
-	kvm_gmem_invalidate_begin(inode, start, end);
+	kvm_gmem_invalidate_start(inode, start, end);
 
 	truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1);
 
@@ -401,7 +401,7 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
 	 * Zap all SPTEs pointed at by this file.  Do not free the backing
 	 * memory, as its lifetime is associated with the inode, not the file.
 	 */
-	__kvm_gmem_invalidate_begin(f, 0, -1ul,
+	__kvm_gmem_invalidate_start(f, 0, -1ul,
 				    kvm_gmem_get_invalidate_filter(inode));
 	__kvm_gmem_invalidate_end(f, 0, -1ul);
 
@@ -582,7 +582,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
 	start = folio->index;
 	end = start + folio_nr_pages(folio);
 
-	kvm_gmem_invalidate_begin(mapping->host, start, end);
+	kvm_gmem_invalidate_start(mapping->host, start, end);
 
 	/*
 	 * Do not truncate the range, what action is taken in response to the
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 60a8b7ca8ab4..5871882ff1db 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -678,7 +678,7 @@ static __always_inline int kvm_age_hva_range_no_flush(struct mmu_notifier *mn,
 	return kvm_age_hva_range(mn, start, end, handler, false);
 }
 
-void kvm_mmu_invalidate_begin(struct kvm *kvm)
+void kvm_mmu_invalidate_start(struct kvm *kvm)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 	/*
@@ -734,7 +734,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 		.start		= range->start,
 		.end		= range->end,
 		.handler	= kvm_mmu_unmap_gfn_range,
-		.on_lock	= kvm_mmu_invalidate_begin,
+		.on_lock	= kvm_mmu_invalidate_start,
 		.flush_on_ret	= true,
 		.may_block	= mmu_notifier_range_blockable(range),
 	};
@@ -2571,7 +2571,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 		.end = end,
 		.arg.attributes = attributes,
 		.handler = kvm_pre_set_memory_attributes,
-		.on_lock = kvm_mmu_invalidate_begin,
+		.on_lock = kvm_mmu_invalidate_start,
 		.flush_on_ret = true,
 		.may_block = true,
 	};
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 4/6] KVM: pfncache: Rename invalidate_start() helper
  2026-03-10  6:36 [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
                   ` (2 preceding siblings ...)
  2026-03-10  6:43 ` [RFC PATCH v3 3/6] KVM: Rename invalidate_begin to invalidate_start for consistency Takahiro Itazuri
@ 2026-03-10  6:43 ` Takahiro Itazuri
  2026-03-10  6:44 ` [RFC PATCH v3 5/6] KVM: Rename mn_* invalidate-related fields to generic ones Takahiro Itazuri
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Takahiro Itazuri @ 2026-03-10  6:43 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Takahiro Itazuri, Takahiro Itazuri

Rename gfn_to_pfn_cache_invalidate_start() to
gpc_invalidate_hva_range_start() to explicitly indicate that it takes a
range of HVA range.

No functional changes intended.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 virt/kvm/kvm_main.c |  2 +-
 virt/kvm/kvm_mm.h   | 12 ++++++------
 virt/kvm/pfncache.c |  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5871882ff1db..d64e70f8e8e3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -763,7 +763,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	 * mn_active_invalidate_count (see above) instead of
 	 * mmu_invalidate_in_progress.
 	 */
-	gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end);
+	gpc_invalidate_hva_range_start(kvm, range->start, range->end);
 
 	/*
 	 * If one or more memslots were found and thus zapped, notify arch code
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 9fcc5d5b7f8d..abd8e7d33ab0 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -56,13 +56,13 @@ struct kvm_follow_pfn {
 kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp);
 
 #ifdef CONFIG_HAVE_KVM_PFNCACHE
-void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
-				       unsigned long start,
-				       unsigned long end);
+void gpc_invalidate_hva_range_start(struct kvm *kvm,
+				    unsigned long start,
+				    unsigned long end);
 #else
-static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
-						     unsigned long start,
-						     unsigned long end)
+static inline void gpc_invalidate_hva_range_start(struct kvm *kvm,
+						  unsigned long start,
+						  unsigned long end)
 {
 }
 #endif /* HAVE_KVM_PFNCACHE */
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 0b49ba98f33f..bafda64b8916 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -23,8 +23,8 @@
 /*
  * MMU notifier 'invalidate_range_start' hook.
  */
-void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
-				       unsigned long end)
+void gpc_invalidate_hva_range_start(struct kvm *kvm, unsigned long start,
+				    unsigned long end)
 {
 	struct gfn_to_pfn_cache *gpc;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 5/6] KVM: Rename mn_* invalidate-related fields to generic ones
  2026-03-10  6:36 [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
                   ` (3 preceding siblings ...)
  2026-03-10  6:43 ` [RFC PATCH v3 4/6] KVM: pfncache: Rename invalidate_start() helper Takahiro Itazuri
@ 2026-03-10  6:44 ` Takahiro Itazuri
  2026-03-11 20:57   ` Sean Christopherson
  2026-03-10  6:44 ` [RFC PATCH v3 6/6] KVM: pfncache: Invalidate on gmem invalidation and memattr updates Takahiro Itazuri
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Takahiro Itazuri @ 2026-03-10  6:44 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Takahiro Itazuri, Takahiro Itazuri

The addition of guest_memfd support to pfncaches introduces additional
sources of pfncache invalidation beyond the MMU notifier path.  The
existing mn_* naming implies that they are only relevant to MMU
notifiers, which is no longer true.

No functional changes intended.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 Documentation/virt/kvm/locking.rst |  8 +++---
 include/linux/kvm_host.h           | 11 ++++---
 virt/kvm/kvm_main.c                | 46 +++++++++++++++---------------
 virt/kvm/pfncache.c                | 28 +++++++++---------
 4 files changed, 47 insertions(+), 46 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index ae8bce7fecbe..73679044ce44 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -20,7 +20,7 @@ The acquisition orders for mutexes are as follows:
 - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
   them together is quite rare.
 
-- kvm->mn_active_invalidate_count ensures that pairs of
+- kvm->active_invalidate_count ensures that pairs of MMU notifier's
   invalidate_range_start() and invalidate_range_end() callbacks
   use the same memslots array.  kvm->slots_lock and kvm->slots_arch_lock
   are taken on the waiting side when modifying memslots, so MMU notifiers
@@ -249,12 +249,12 @@ time it will be set using the Dirty tracking mechanism described above.
 :Comment:	Exists to allow taking cpus_read_lock() while kvm_usage_count is
 		protected, which simplifies the virtualization enabling logic.
 
-``kvm->mn_invalidate_lock``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
+``kvm->invalidate_lock``
+^^^^^^^^^^^^^^^^^^^^^^^^
 
 :Type:          spinlock_t
 :Arch:          any
-:Protects:      mn_active_invalidate_count, mn_memslots_update_rcuwait
+:Protects:      active_invalidate_count, memslots_update_rcuwait
 
 ``kvm_arch::tsc_write_lock``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 618a71894ed1..7faa83d3d306 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -814,10 +814,13 @@ struct kvm {
 	 */
 	atomic_t nr_memslots_dirty_logging;
 
-	/* Used to wait for completion of MMU notifiers.  */
-	spinlock_t mn_invalidate_lock;
-	unsigned long mn_active_invalidate_count;
-	struct rcuwait mn_memslots_update_rcuwait;
+	/*
+	 * Used by active memslots swap and pfncache refresh to wait for
+	 * invalidation to complete.
+	 */
+	spinlock_t invalidate_lock;
+	unsigned long active_invalidate_count;
+	struct rcuwait memslots_update_rcuwait;
 
 	/* For management / invalidation of gfn_to_pfn_caches */
 	spinlock_t gpc_lock;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d64e70f8e8e3..f51056e971d0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -749,9 +749,9 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	 *
 	 * Pairs with the decrement in range_end().
 	 */
-	spin_lock(&kvm->mn_invalidate_lock);
-	kvm->mn_active_invalidate_count++;
-	spin_unlock(&kvm->mn_invalidate_lock);
+	spin_lock(&kvm->invalidate_lock);
+	kvm->active_invalidate_count++;
+	spin_unlock(&kvm->invalidate_lock);
 
 	/*
 	 * Invalidate pfn caches _before_ invalidating the secondary MMUs, i.e.
@@ -760,7 +760,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	 * any given time, and the caches themselves can check for hva overlap,
 	 * i.e. don't need to rely on memslot overlap checks for performance.
 	 * Because this runs without holding mmu_lock, the pfn caches must use
-	 * mn_active_invalidate_count (see above) instead of
+	 * active_invalidate_count (see above) instead of
 	 * mmu_invalidate_in_progress.
 	 */
 	gpc_invalidate_hva_range_start(kvm, range->start, range->end);
@@ -819,18 +819,18 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
 	kvm_handle_hva_range(kvm, &hva_range);
 
 	/* Pairs with the increment in range_start(). */
-	spin_lock(&kvm->mn_invalidate_lock);
-	if (!WARN_ON_ONCE(!kvm->mn_active_invalidate_count))
-		--kvm->mn_active_invalidate_count;
-	wake = !kvm->mn_active_invalidate_count;
-	spin_unlock(&kvm->mn_invalidate_lock);
+	spin_lock(&kvm->invalidate_lock);
+	if (!WARN_ON_ONCE(!kvm->active_invalidate_count))
+		--kvm->active_invalidate_count;
+	wake = !kvm->active_invalidate_count;
+	spin_unlock(&kvm->invalidate_lock);
 
 	/*
 	 * There can only be one waiter, since the wait happens under
 	 * slots_lock.
 	 */
 	if (wake)
-		rcuwait_wake_up(&kvm->mn_memslots_update_rcuwait);
+		rcuwait_wake_up(&kvm->memslots_update_rcuwait);
 }
 
 static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
@@ -1131,8 +1131,8 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 	mutex_init(&kvm->irq_lock);
 	mutex_init(&kvm->slots_lock);
 	mutex_init(&kvm->slots_arch_lock);
-	spin_lock_init(&kvm->mn_invalidate_lock);
-	rcuwait_init(&kvm->mn_memslots_update_rcuwait);
+	spin_lock_init(&kvm->invalidate_lock);
+	rcuwait_init(&kvm->memslots_update_rcuwait);
 	xa_init(&kvm->vcpu_array);
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 	xa_init(&kvm->mem_attr_array);
@@ -1299,7 +1299,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	/*
 	 * At this point, pending calls to invalidate_range_start()
 	 * have completed but no more MMU notifiers will run, so
-	 * mn_active_invalidate_count may remain unbalanced.
+	 * active_invalidate_count may remain unbalanced.
 	 * No threads can be waiting in kvm_swap_active_memslots() as the
 	 * last reference on KVM has been dropped, but freeing
 	 * memslots would deadlock without this manual intervention.
@@ -1308,9 +1308,9 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	 * notifier between a start() and end(), then there shouldn't be any
 	 * in-progress invalidations.
 	 */
-	WARN_ON(rcuwait_active(&kvm->mn_memslots_update_rcuwait));
-	if (kvm->mn_active_invalidate_count)
-		kvm->mn_active_invalidate_count = 0;
+	WARN_ON(rcuwait_active(&kvm->memslots_update_rcuwait));
+	if (kvm->active_invalidate_count)
+		kvm->active_invalidate_count = 0;
 	else
 		WARN_ON(kvm->mmu_invalidate_in_progress);
 #else
@@ -1640,17 +1640,17 @@ static void kvm_swap_active_memslots(struct kvm *kvm, int as_id)
 	 * progress, otherwise the locking in invalidate_range_start and
 	 * invalidate_range_end will be unbalanced.
 	 */
-	spin_lock(&kvm->mn_invalidate_lock);
-	prepare_to_rcuwait(&kvm->mn_memslots_update_rcuwait);
-	while (kvm->mn_active_invalidate_count) {
+	spin_lock(&kvm->invalidate_lock);
+	prepare_to_rcuwait(&kvm->memslots_update_rcuwait);
+	while (kvm->active_invalidate_count) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
-		spin_unlock(&kvm->mn_invalidate_lock);
+		spin_unlock(&kvm->invalidate_lock);
 		schedule();
-		spin_lock(&kvm->mn_invalidate_lock);
+		spin_lock(&kvm->invalidate_lock);
 	}
-	finish_rcuwait(&kvm->mn_memslots_update_rcuwait);
+	finish_rcuwait(&kvm->memslots_update_rcuwait);
 	rcu_assign_pointer(kvm->memslots[as_id], slots);
-	spin_unlock(&kvm->mn_invalidate_lock);
+	spin_unlock(&kvm->invalidate_lock);
 
 	/*
 	 * Acquired in kvm_set_memslot. Must be released before synchronize
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index bafda64b8916..63e08fbac16d 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -147,26 +147,24 @@ static void gpc_unmap(kvm_pfn_t pfn, void *khva)
 static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_seq)
 {
 	/*
-	 * mn_active_invalidate_count acts for all intents and purposes
-	 * like mmu_invalidate_in_progress here; but the latter cannot
-	 * be used here because the invalidation of caches in the
-	 * mmu_notifier event occurs _before_ mmu_invalidate_in_progress
-	 * is elevated.
+	 * active_invalidate_count acts for all intents and purposes like
+	 * mmu_invalidate_in_progress here; but the latter cannot be used here
+	 * because the invalidation of caches in the mmu_notifier event occurs
+	 * _before_ mmu_invalidate_in_progress is elevated.
 	 *
-	 * Note, it does not matter that mn_active_invalidate_count
-	 * is not protected by gpc->lock.  It is guaranteed to
-	 * be elevated before the mmu_notifier acquires gpc->lock, and
-	 * isn't dropped until after mmu_invalidate_seq is updated.
+	 * Note, it does not matter that active_invalidate_count is not
+	 * protected by gpc->lock.  It is guaranteed to be elevated before the
+	 * mmu_notifier acquires gpc->lock, and isn't dropped until after
+	 * mmu_invalidate_seq is updated.
 	 */
-	if (kvm->mn_active_invalidate_count)
+	if (kvm->active_invalidate_count)
 		return true;
 
 	/*
-	 * Ensure mn_active_invalidate_count is read before
-	 * mmu_invalidate_seq.  This pairs with the smp_wmb() in
-	 * mmu_notifier_invalidate_range_end() to guarantee either the
-	 * old (non-zero) value of mn_active_invalidate_count or the
-	 * new (incremented) value of mmu_invalidate_seq is observed.
+	 * Ensure active_invalidate_count is read before mmu_invalidate_seq.
+	 * This pairs with the smp_wmb() in kvm_mmu_invalidate_end() to
+	 * guarantee either the old (non-zero) value of active_invalidate_count
+	 * or the new (incremented) value of mmu_invalidate_seq is observed.
 	 */
 	smp_rmb();
 	return kvm->mmu_invalidate_seq != mmu_seq;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 6/6] KVM: pfncache: Invalidate on gmem invalidation and memattr updates
  2026-03-10  6:36 [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
                   ` (4 preceding siblings ...)
  2026-03-10  6:44 ` [RFC PATCH v3 5/6] KVM: Rename mn_* invalidate-related fields to generic ones Takahiro Itazuri
@ 2026-03-10  6:44 ` Takahiro Itazuri
  2026-03-11 12:04 ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache David Woodhouse
  2026-03-11 22:32 ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Sean Christopherson
  7 siblings, 0 replies; 14+ messages in thread
From: Takahiro Itazuri @ 2026-03-10  6:44 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Takahiro Itazuri, Takahiro Itazuri

Invalidate pfncaches when guest_memfd invalidation or memory attribute
updates render cached PFN resolutions stale.

Reuse active_invalidate_count to synchronize with the existing retry
logic and preserve ordering against mmu_invalidate_seq.

Invalidation needs to be performed using HVA ranges so that both
GPA-based and HVA-based pfncaches are covered.  Internally GPA-based
ones translate GPA to memslot/UHVA first and then resolve PFN, while
HVA-based ones only resolve PFN and do not store memslot/GPA context.
Technically, it is possible to make HVA-based pfncaches search the
corresponding memslot/GPA when activated / refreshed, but it would add
overhead to a greater ot lesser extent, regardless of guest_memfd-backed
or not.  At the time of writing, only Xen uses HVA-based pfncaches.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
---
 virt/kvm/guest_memfd.c | 50 ++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c    | 47 ++++++++++++++++++++++++++++++++++++++-
 virt/kvm/pfncache.c    |  4 ++--
 3 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 79f34dad0c2f..eb2f1a7e54dc 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -215,6 +215,33 @@ static void __kvm_gmem_invalidate_start(struct gmem_file *f, pgoff_t start,
 	struct kvm *kvm = f->kvm;
 	unsigned long index;
 
+	/*
+	 * Prevent pfncaches from being activated / refreshed using stale PFN
+	 * resolutions.  To invalidate pfncaches _before_ invalidating the
+	 * secondary MMUs (i.e. without acquiring mmu_lock), pfncaches must use
+	 * active_invalidate_count instead of mmu_invalidate_in_progress.
+	 */
+	spin_lock(&kvm->invalidate_lock);
+	kvm->active_invalidate_count++;
+	spin_unlock(&kvm->invalidate_lock);
+
+	/*
+	 * Invalidation of pfncaches must be done using a HVA range.  pfncaches
+	 * can be either GPA-based or HVA-based, and all pfncaches store uhva
+	 * while HVA-based pfncaches do not have gpa/memslot info.  Thus,
+	 * using GFN ranges would miss invalidating HVA-based ones.
+	 */
+	xa_for_each_range(&f->bindings, index, slot, start, end - 1) {
+		pgoff_t pgoff = slot->gmem.pgoff;
+		gfn_t gfn_start = slot->base_gfn + max(pgoff, start) - pgoff;
+		gfn_t gfn_end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff;
+
+		unsigned long hva_start = gfn_to_hva_memslot(slot, gfn_start);
+		unsigned long hva_end = gfn_to_hva_memslot(slot, gfn_end);
+
+		gpc_invalidate_hva_range_start(kvm, hva_start, hva_end);
+	}
+
 	xa_for_each_range(&f->bindings, index, slot, start, end - 1) {
 		pgoff_t pgoff = slot->gmem.pgoff;
 
@@ -259,12 +286,35 @@ static void __kvm_gmem_invalidate_end(struct gmem_file *f, pgoff_t start,
 				      pgoff_t end)
 {
 	struct kvm *kvm = f->kvm;
+	bool wake;
 
 	if (xa_find(&f->bindings, &start, end - 1, XA_PRESENT)) {
 		KVM_MMU_LOCK(kvm);
 		kvm_mmu_invalidate_end(kvm);
 		KVM_MMU_UNLOCK(kvm);
 	}
+
+	/*
+	 * This must be done after the increment of mmu_invalidate_seq and
+	 * smp_wmb() in kvm_mmu_invalidate_end() to guarantee that
+	 * gpc_invalidate_retry() observes either the old (non-zero)
+	 * active_invalidate_count or the new (incremented) mmu_invalidate_seq.
+	 */
+	spin_lock(&kvm->invalidate_lock);
+	if (!WARN_ON_ONCE(!kvm->active_invalidate_count))
+		kvm->active_invalidate_count--;
+	wake = !kvm->active_invalidate_count;
+	spin_unlock(&kvm->invalidate_lock);
+
+	/*
+	 * guest_memfd invalidation itself doesn't need to block active memslots
+	 * swap as bindings updates are serialized by filemap_invalidate_lock().
+	 * However, active_invalidate_count is shared with the MMU notifier
+	 * path, so the waiter must be waked when active_invalidate_count drops
+	 * to zero.
+	 */
+	if (wake)
+		rcuwait_wake_up(&kvm->memslots_update_rcuwait);
 }
 
 static void kvm_gmem_invalidate_end(struct inode *inode, pgoff_t start,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f51056e971d0..2ad31e491090 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2583,9 +2583,11 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 		.on_lock = kvm_mmu_invalidate_end,
 		.may_block = true,
 	};
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *slot;
 	unsigned long i;
 	void *entry;
-	int r = 0;
+	int r = 0, bkt;
 
 	entry = attributes ? xa_mk_value(attributes) : NULL;
 
@@ -2609,6 +2611,34 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 		cond_resched();
 	}
 
+	/*
+	 * Prevent pfncaches from being activated / refreshed using stale PFN
+	 * resolutions.  To invalidate pfncaches _before_ invalidating the
+	 * secondary MMUs (i.e. without acquiring mmu_lock), pfncaches must use
+	 * active_invalidate_count instead of mmu_invalidate_in_progress.
+	 */
+	spin_lock(&kvm->invalidate_lock);
+	kvm->active_invalidate_count++;
+	spin_unlock(&kvm->invalidate_lock);
+
+	/*
+	 * Invalidation of pfncaches must be done using a HVA range.  pfncaches
+	 * can be either GPA-based or HVA-based, and all pfncaches store uhva
+	 * while HVA-based pfncaches do not have gpa/memslot info.  Thus,
+	 * using GFN ranges would miss invalidating HVA-based ones.
+	 */
+	kvm_for_each_memslot(slot, bkt, slots) {
+		gfn_t gfn_start = max(start, slot->base_gfn);
+		gfn_t gfn_end = min(end, slot->base_gfn + slot->npages);
+
+		if (gfn_start < gfn_end) {
+			unsigned long hva_start = gfn_to_hva_memslot(slot, gfn_start);
+			unsigned long hva_end = gfn_to_hva_memslot(slot, gfn_end);
+
+			gpc_invalidate_hva_range_start(kvm, hva_start, hva_end);
+		}
+	}
+
 	kvm_handle_gfn_range(kvm, &pre_set_range);
 
 	for (i = start; i < end; i++) {
@@ -2620,6 +2650,21 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 
 	kvm_handle_gfn_range(kvm, &post_set_range);
 
+	/*
+	 * This must be done after the increment of mmu_invalidate_seq and
+	 * smp_wmb() in kvm_mmu_invalidate_end() to guarantee that
+	 * gpc_invalidate_retry() observes either the old (non-zero)
+	 * active_invalidate_count or the new (incremented) mmu_invalidate_seq.
+	 *
+	 * memslots_update_rcuwait does not need to be waked when
+	 * active_invalidate_count drops to zero because active memslots swap is
+	 * also done while holding slots_lock.
+	 */
+	spin_lock(&kvm->invalidate_lock);
+	if (!WARN_ON_ONCE(!kvm->active_invalidate_count))
+		kvm->active_invalidate_count--;
+	spin_unlock(&kvm->invalidate_lock);
+
 out_unlock:
 	mutex_unlock(&kvm->slots_lock);
 
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 63e08fbac16d..42b3b849f78b 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -144,7 +144,7 @@ static void gpc_unmap(kvm_pfn_t pfn, void *khva)
 #endif
 }
 
-static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_seq)
+static inline bool gpc_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq)
 {
 	/*
 	 * active_invalidate_count acts for all intents and purposes like
@@ -274,7 +274,7 @@ static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 		 * attempting to refresh.
 		 */
 		WARN_ON_ONCE(gpc->valid);
-	} while (mmu_notifier_retry_cache(gpc->kvm, mmu_seq));
+	} while (gpc_invalidate_retry(gpc->kvm, mmu_seq));
 
 	gpc->valid = true;
 	gpc->pfn = new_pfn;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache
  2026-03-10  6:36 [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
                   ` (5 preceding siblings ...)
  2026-03-10  6:44 ` [RFC PATCH v3 6/6] KVM: pfncache: Invalidate on gmem invalidation and memattr updates Takahiro Itazuri
@ 2026-03-11 12:04 ` David Woodhouse
  2026-03-12 14:02   ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to Takahiro Itazuri
  2026-03-11 22:32 ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Sean Christopherson
  7 siblings, 1 reply; 14+ messages in thread
From: David Woodhouse @ 2026-03-11 12:04 UTC (permalink / raw)
  To: Takahiro Itazuri, kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	Paul Durrant, Nikita Kalyazin, Patrick Roy, Takahiro Itazuri

[-- Attachment #1: Type: text/plain, Size: 522 bytes --]

On Tue, 2026-03-10 at 06:36 +0000, Takahiro Itazuri wrote:
> [ based on v6.18 with [1] ]
> 
> This patch series is another follow-up to RFC v1 with minor fixes of RFC
> v2.  (This is still labelled RFC since its dependency [1] has not yet
> been merged.)  This change was tested for guest_memfd created with
> GUEST_MEMFD_FLAG_MMAP and GUEST_MEMFD_FLAG_NO_DIRECT_MAP in the feature
> branch of Firecracker [2].

This all looks good to me, thanks. I'd love to see a set of KVM
selftests to exercise it though.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 3/6] KVM: Rename invalidate_begin to invalidate_start for consistency
  2026-03-10  6:43 ` [RFC PATCH v3 3/6] KVM: Rename invalidate_begin to invalidate_start for consistency Takahiro Itazuri
@ 2026-03-11 20:53   ` Sean Christopherson
  2026-03-12 14:17     ` Takahiro Itazuri
  0 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2026-03-11 20:53 UTC (permalink / raw)
  To: Takahiro Itazuri
  Cc: kvm, Paolo Bonzini, Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman,
	David Hildenbrand, David Woodhouse, Paul Durrant, Nikita Kalyazin,
	Patrick Roy, Takahiro Itazuri

On Tue, Mar 10, 2026, Takahiro Itazuri wrote:
> Most MMU-related helpers use "_start" suffix.  Align with the prevailing
> naming convention for consistency across MMU-related codebase.
> 
> ```
> $ git grep -E "invalidate(_range)?_start" | wc -l
> 123
> 
> $ git grep -E "invalidate(_range)?_begin" | wc -l
> 14
> ```

I'm a-ok with the change, but the changelog should make it clear to what KVM is
conforming.  Very specifically, IMO, all that really matters here is aligning with
mmu_notifier_ops.invalidate_range_start().

Because for me, "MMU-related" anything in the context of KVM means just that,
KVM's MMU code.  I don't care what other MMU-related code outside of KVM uses for
naming, if that code doesn't interact with KVM in any way.  And for KVM specifically,
it's a much closer race.

  $ git grep -E "invalidate(_range)?_begin"  **/kvm | wc -l
  11
  $ git grep -E "invalidate(_range)?_start"  **/kvm | wc -l
  16

But as above, I'm definitely in favor of matching mmu_notifier_ops.invalidate_range_start().

> No functional change intended.
> 
> Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
> ---
>  arch/x86/kvm/mmu/mmu.c       |  2 +-
>  include/linux/kvm_host.h     |  2 +-
>  include/linux/mmu_notifier.h |  4 ++--
>  virt/kvm/guest_memfd.c       | 14 +++++++-------
>  virt/kvm/kvm_main.c          |  6 +++---
>  5 files changed, 14 insertions(+), 14 deletions(-)

...

> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2ea5d2f172f7..618a71894ed1 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1566,7 +1566,7 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
>  void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
>  #endif
>  
> -void kvm_mmu_invalidate_begin(struct kvm *kvm);
> +void kvm_mmu_invalidate_start(struct kvm *kvm);
>  void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
>  void kvm_mmu_invalidate_end(struct kvm *kvm);
>  bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index d1094c2d5fb6..8ecf36a84e3b 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -134,8 +134,8 @@ struct mmu_notifier_ops {
>  	 * Invalidation of multiple concurrent ranges may be
>  	 * optionally permitted by the driver. Either way the
>  	 * establishment of sptes is forbidden in the range passed to
> -	 * invalidate_range_begin/end for the whole duration of the
> -	 * invalidate_range_begin/end critical section.
> +	 * invalidate_range_start/end for the whole duration of the
> +	 * invalidate_range_start/end critical section.
>  	 *
>  	 * invalidate_range_start() is called when all pages in the
>  	 * range are still mapped and have at least a refcount of one.

Please move the include/linux/mmu_notifier.h change to its own standalone patch,
as in, not even part of this series.  It's an obviously correct change (the comment
has been wrong since commit cddb8a5c14aa ("mmu-notifiers: core")) and has nothing
to do with KVM.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 5/6] KVM: Rename mn_* invalidate-related fields to generic ones
  2026-03-10  6:44 ` [RFC PATCH v3 5/6] KVM: Rename mn_* invalidate-related fields to generic ones Takahiro Itazuri
@ 2026-03-11 20:57   ` Sean Christopherson
  2026-03-12 14:33     ` Takahiro Itazuri
  0 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2026-03-11 20:57 UTC (permalink / raw)
  To: Takahiro Itazuri
  Cc: kvm, Paolo Bonzini, Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman,
	David Hildenbrand, David Woodhouse, Paul Durrant, Nikita Kalyazin,
	Patrick Roy, Takahiro Itazuri

On Tue, Mar 10, 2026, Takahiro Itazuri wrote:
> The addition of guest_memfd support to pfncaches introduces additional
> sources of pfncache invalidation beyond the MMU notifier path.  The
> existing mn_* naming implies that they are only relevant to MMU
> notifiers, which is no longer true.

I very strongly disagree.  Except for kvm_swap_active_memslots() and
kvm_create_vm(), literally every function here has mmu_notifier in its name.

They no longer are used only the for the _kernel's_ MMU-notifier implementation,
but they're still very much scoped explicitly to KVM's overarching MMU notification
system.

If we want to come up with a "better" name, then it needs to capture that somewhere
in the prefix.  Because e.g. invalidate_lock is way, way too generic.  I read that
and my very first question is "ivalidate what, exactly?".  Ditto for
memslots_update_rcuwait and pretty much every other field.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache
  2026-03-10  6:36 [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
                   ` (6 preceding siblings ...)
  2026-03-11 12:04 ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache David Woodhouse
@ 2026-03-11 22:32 ` Sean Christopherson
  7 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2026-03-11 22:32 UTC (permalink / raw)
  To: Takahiro Itazuri
  Cc: kvm, Paolo Bonzini, Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman,
	David Hildenbrand, David Woodhouse, Paul Durrant, Nikita Kalyazin,
	Patrick Roy, Takahiro Itazuri, Fred Griffoul

+Fred

On Tue, Mar 10, 2026, Takahiro Itazuri wrote:
> [ based on v6.18 with [1] ]
> 
> This patch series is another follow-up to RFC v1 with minor fixes of RFC
> v2.  (This is still labelled RFC since its dependency [1] has not yet
> been merged.)  This change was tested for guest_memfd created with
> GUEST_MEMFD_FLAG_MMAP and GUEST_MEMFD_FLAG_NO_DIRECT_MAP in the feature
> branch of Firecracker [2].
> 
> === Problem Statement ===
> 
> gfn_to_pfn_cache (a.k.a. pfncache) does not work with guest_memfd.  As
> of today, pfncaches resolve PFNs via hva_to_pfn(), which requires a
> userspace mapping and relies on GUP.  This does not work for guest_memfd
> in the following two ways:
> 
>   * guest_memfd created with GUEST_MEMFD_FLAG_MMAP does not have a
>     userspace mapping due to the nature of private memory.
> 
>   * guest_memfd created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP uses an
>     AS_NO_DIRECT_MAP mapping, which is rejected by GUP.

First off, I'm _very_ excited (like, super duper excited) that y'all are
contributing upstream, and especially that guest_memfd is getting traction.

Second, I acknowledge that I am quite, let's say "particular", in my reviews.
Some Googlers have made joking bets about how many revisions will be required to
"get past Sean".

Third, I also fully realize that the ability to engage upstream is almost always
beyond the control of individual engineers.

Fourth, what am I about to say/type applies to many companines, Google very much
included.  In fact, our less-than-awesome engagement in other areas of the kernel
is in large part *why* I'm typing this: I've seen what happens when upstream gets
too frustrated with one-sided relatiopships, and I want to try and foster a healthy
relationship instead of getting to a point where I (and/or others) are so grumpy
that it becomes a horrible experience for everyone.

All that said, I'm a bit frustrated when it comes to Amazon's engagment, and to
pfncache in particular.  I have invested *significant* time and energy over the
last few years reviewing and collaborating on several series that, for whatever
reason, were completely abandoned.  Off the top of my head:

 - Coalesced MMIO [https://lore.kernel.org/all/20240820133333.1724191-1-ilstam@amazon.com]
 - kvmlock mess [https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org]
 - pfncache optimizations [https://lore.kernel.org/all/20240821202814.711673-1-dwmw2@infradead.org]

Again, I understand that priorities shift, e.g. that the whole coalesced MMIO
thing may be obsolete.  But kvmclock is most definitely still a mess (I *really*
want the above series to land), and obviously y'all still care about pfncache.

What makes me especially sensitive to pfncache is that, AFAIK, AWS is the only
user of kernel-unmanaged guest memory, i.e. is the only user that _needs_ things
like pfncache and kvm_vcpu_map().  And that suite of features in particular has
been a source of pain, both in terms of bugs and ongoing maintenance cost.

Which, on its own, is totally fine.  By accepting code upstream we're also
accepting responsibility for fixing and maintaining the code.

Where it becomes a problem, at least for me, is when I invest a lot energy in
reviewing a series and brainstorming solutions, only for the series/thing to be
abandoned for whatever reason.  And seeing a new feature-of-the-week come along
from the same company only exacerbates things.

pfncache is especially frustrating because kernel-unmanaged guest memory isn't
exactly trivial to support in KVM, and it's obviously important enough to add
support for things like nVMX and guest_memfd, but yet not important enough to
bother putting in the effort to land the optimizations.

By no means am I saying "no".  This series and Fred's nVMX series are very much
on on my todo list.  But, reviews may be slow and the bar for inclusion may be
higher than you feel is justified.

But the main reason I typed all that up: please communicate to your leadership
that sporadic, unpredictable engagement is making upstream a bit grumpy.  If your
company is anything like mine, I know all too well that it's very difficult to
quantify the benefits of staying engaged with upstream, and thus very difficult
to get priorized.  But leadership tends to react quite quickly to "this is causing
problems *now*".  We're nowhere near things being truly problematic, but maybe if
you can complain a little now, you won't have to complain a lot later, and everyone
will be happier.

P.S. I'm going to be offline for the next two weeks, so for completely unrelated
reasons, this first round of reviews is going to be extra slow.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to
  2026-03-11 12:04 ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache David Woodhouse
@ 2026-03-12 14:02   ` Takahiro Itazuri
  0 siblings, 0 replies; 14+ messages in thread
From: Takahiro Itazuri @ 2026-03-12 14:02 UTC (permalink / raw)
  To: dwmw2
  Cc: david, itazur, jackmanb, kalyazin, kvm, patrick.roy, pbonzini,
	pdurrant, seanjc, tabba, vkuznets, zulinx86

On Wed, 11 Mar 2026 13:04:13 +0100, David Woodhouse wrote:
> On Tue, 2026-03-10 at 06:36 +0000, Takahiro Itazuri wrote:
> > [ based on v6.18 with [1] ]
> >
> > This patch series is another follow-up to RFC v1 with minor fixes of RFC
> > v2.=C2=A0 (This is still labelled RFC since its dependency [1] has not yet
> > been merged.)  This change was tested for guest_memfd created with
> > GUEST_MEMFD_FLAG_MMAP and GUEST_MEMFD_FLAG_NO_DIRECT_MAP in the feature
> > branch of Firecracker [2].
> 
> This all looks good to me, thanks. I'd love to see a set of KVM
> selftests to exercise it though.

I'll try to add selftests in the next round.  Thanks for taking a look :)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 3/6] KVM: Rename invalidate_begin to invalidate_start for consistency
  2026-03-11 20:53   ` Sean Christopherson
@ 2026-03-12 14:17     ` Takahiro Itazuri
  0 siblings, 0 replies; 14+ messages in thread
From: Takahiro Itazuri @ 2026-03-12 14:17 UTC (permalink / raw)
  To: seanjc
  Cc: david, dwmw2, itazur, jackmanb, kalyazin, kvm, patrick.roy,
	pbonzini, pdurrant, tabba, vkuznets, zulinx86

On Wed, 11 Mar 2026 13:53:00 -0700, Sean Christopherson wrote:
> On Tue, Mar 10, 2026, Takahiro Itazuri wrote:
> > Most MMU-related helpers use "_start" suffix.  Align with the prevailing
> > naming convention for consistency across MMU-related codebase.
> > 
> > ```
> > $ git grep -E "invalidate(_range)?_start" | wc -l
> > 123
> > 
> > $ git grep -E "invalidate(_range)?_begin" | wc -l
> > 14
> > ```
> 
> I'm a-ok with the change, but the changelog should make it clear to what KVM is
> conforming.  Very specifically, IMO, all that really matters here is aligning with
> mmu_notifier_ops.invalidate_range_start().
> 
> Because for me, "MMU-related" anything in the context of KVM means just that,
> KVM's MMU code.  I don't care what other MMU-related code outside of KVM uses for
> naming, if that code doesn't interact with KVM in any way.  And for KVM specifically,
> it's a much closer race.
> 
>   $ git grep -E "invalidate(_range)?_begin"  **/kvm | wc -l
>   11
>   $ git grep -E "invalidate(_range)?_start"  **/kvm | wc -l
>   16
> 
> But as above, I'm definitely in favor of matching mmu_notifier_ops.invalidate_range_start().

Fair enough.  I'll update the commit message as you suggested!

On Wed, 11 Mar 2026 13:53:00 -0700, Sean Christopherson wrote:
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 2ea5d2f172f7..618a71894ed1 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1566,7 +1566,7 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
> >  void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
> >  #endif
> >  
> > -void kvm_mmu_invalidate_begin(struct kvm *kvm);
> > +void kvm_mmu_invalidate_start(struct kvm *kvm);
> >  void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
> >  void kvm_mmu_invalidate_end(struct kvm *kvm);
> >  bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
> > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> > index d1094c2d5fb6..8ecf36a84e3b 100644
> > --- a/include/linux/mmu_notifier.h
> > +++ b/include/linux/mmu_notifier.h
> > @@ -134,8 +134,8 @@ struct mmu_notifier_ops {
> >  	 * Invalidation of multiple concurrent ranges may be
> >  	 * optionally permitted by the driver. Either way the
> >  	 * establishment of sptes is forbidden in the range passed to
> > -	 * invalidate_range_begin/end for the whole duration of the
> > -	 * invalidate_range_begin/end critical section.
> > +	 * invalidate_range_start/end for the whole duration of the
> > +	 * invalidate_range_start/end critical section.
> >  	 *
> >  	 * invalidate_range_start() is called when all pages in the
> >  	 * range are still mapped and have at least a refcount of one.
> 
> Please move the include/linux/mmu_notifier.h change to its own standalone patch,
> as in, not even part of this series.  It's an obviously correct change (the comment
> has been wrong since commit cddb8a5c14aa ("mmu-notifiers: core")) and has nothing
> to do with KVM.

You're right.  I'll exclude it from this series.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 5/6] KVM: Rename mn_* invalidate-related fields to generic ones
  2026-03-11 20:57   ` Sean Christopherson
@ 2026-03-12 14:33     ` Takahiro Itazuri
  0 siblings, 0 replies; 14+ messages in thread
From: Takahiro Itazuri @ 2026-03-12 14:33 UTC (permalink / raw)
  To: seanjc
  Cc: david, dwmw2, itazur, jackmanb, kalyazin, kvm, patrick.roy,
	pbonzini, pdurrant, tabba, vkuznets, zulinx86

On Wed, 11 Mar 2026 13:57:30 -0700, Sean Christopherson wrote:
> On Tue, Mar 10, 2026, Takahiro Itazuri wrote:
> > The addition of guest_memfd support to pfncaches introduces additional
> > sources of pfncache invalidation beyond the MMU notifier path.  The
> > existing mn_* naming implies that they are only relevant to MMU
> > notifiers, which is no longer true.
> 
> I very strongly disagree.  Except for kvm_swap_active_memslots() and
> kvm_create_vm(), literally every function here has mmu_notifier in its name.
> 
> They no longer are used only the for the _kernel's_ MMU-notifier implementation,
> but they're still very much scoped explicitly to KVM's overarching MMU notification
> system.
>
> If we want to come up with a "better" name, then it needs to capture that somewhere
> in the prefix.  Because e.g. invalidate_lock is way, way too generic.  I read that
> and my very first question is "ivalidate what, exactly?".  Ditto for
> memslots_update_rcuwait and pretty much every other field.

That makes sense to me.  I only had the kernel's MMU notifier in mind
at that time, which is why I thought this renaming would be appropriate.

Since I'm not particularly seeking a better name, I'll leave these as
they are.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-03-12 14:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10  6:36 [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
2026-03-10  6:41 ` [RFC PATCH v3 1/6] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs Takahiro Itazuri
2026-03-10  6:43 ` [RFC PATCH v3 2/6] KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP Takahiro Itazuri
2026-03-10  6:43 ` [RFC PATCH v3 3/6] KVM: Rename invalidate_begin to invalidate_start for consistency Takahiro Itazuri
2026-03-11 20:53   ` Sean Christopherson
2026-03-12 14:17     ` Takahiro Itazuri
2026-03-10  6:43 ` [RFC PATCH v3 4/6] KVM: pfncache: Rename invalidate_start() helper Takahiro Itazuri
2026-03-10  6:44 ` [RFC PATCH v3 5/6] KVM: Rename mn_* invalidate-related fields to generic ones Takahiro Itazuri
2026-03-11 20:57   ` Sean Christopherson
2026-03-12 14:33     ` Takahiro Itazuri
2026-03-10  6:44 ` [RFC PATCH v3 6/6] KVM: pfncache: Invalidate on gmem invalidation and memattr updates Takahiro Itazuri
2026-03-11 12:04 ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache David Woodhouse
2026-03-12 14:02   ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to Takahiro Itazuri
2026-03-11 22:32 ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox