[RFC PATCH v4 0/7] KVM: pfncache: Add guest

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH v4 0/7] KVM: pfncache: Add guest_memfd support to pfncache
@ 2026-04-20 15:46 Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 1/7] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs Takahiro Itazuri
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Takahiro Itazuri @ 2026-04-20 15:46 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Patrick Roy, Derek Manwaring, Alina Cernea, Michael Zoumboulakis,
	Takahiro Itazuri, Takahiro Itazuri

[ based on 6.18 with [1] ]

This patch series adds guest_memfd support to gfn_to_pfn_cache (a.k.a.
pfncache).  (This is still labelled RFC since its dependency [1] has not
yet been merged.)

=== Problem Statement ===

pfncache does not work with guest_memfd.  pfncaches resolve PFNs via
hva_to_pfn(), which requires a userspace mapping and relies on GUP.
This does not work for guest_memfd in the following two ways:

  * guest_memfd created without MMAP flag does not have a userspace
    mapping due to the nature of private memory.

  * guest_memfd created with NO_DIRECT_MAP flag uses an AS_NO_DIRECT_MAP
    mapping, which is rejected by GUP.

In addition, pfncaches map RAM pages via kmap(), which typically returns
an address derived from the direct map.  So kmap() cannot be used for
NO_DIRECT_MAP guest_memfd.  pfncaches require fault-free KHVAs since
they can be used from atomic context.  Thus, it cannot fall back to
access via a userspace mapping like KVM does for other accesses to
NO_DIRECT_MAP guest_memfd.

The introduction of guest_memfd support necessitates additional
invalidation paths in addition to the existing MMU notifier path: one
from guest_memfd invalidation and another from memory attribute updates.

=== Core Approach ===

  * Resolve PFNs for guest_memfd-backed GPAs via kvm_gmem_get_pfn().

  * Obtain a fault-free KHVA for NO_DIRECT_MAP pages via vmap().

  * Hook pfncache invalidation into guest_memfd invalidation (punch hole
    / release / error handling) as well as into memory attribute updates
    (switch between shared and private memories).

  * Reuse mn_active_invalidate_count to synchronize the new invalidation
    paths with the existing pfncache retry logic.

=== Design Considerations (Feedback Appreciated) ===

  * Reusing mn_active_invalidate_count allows reusing the existing
    pfncache retry logic as-is and enables invalidating pfncaches
    without holding mmu_lock from guest_memfd invalidation context.  As
    a side effect, active memslots swap is blocked while
    mn_active_invalidate_count > 0.  To avoid this block, it would be
    possible to introduce a dedicated counter instead.

  * Although both guest_memfd invalidation and memory attribute update
    are driven by GFN ranges, pfncache invalidation is performed using
    HVA ranges.  GPA-based pfncaches have memslot/GPA context, whereas
    HVA-based pfncaches do not.  Using GFN-based invalidation would
    miss HVA-based pfncaches.

  * The current implementation does not support HVA-based pfncaches for
    NO_DIRECT_MAP guest_memfd.  HVA-based pfncaches do not store
    memslot/GPA context, so they cannot determine whether the target is
    gmem-backed and always fall back to GUP (hva_to_pfn()), which fails
    for NO_DIRECT_MAP pages.  Adding a memslot/GPA lookup is possible
    but would add overhead to all HVA-based pfncache activations and
    refreshes.  At the time of writing, only Xen uses HVA-based
    pfncaches.

=== Changelog ===

Changes since RFC v3:
- Drop the rename of mn_* invalidate-related fields to generic ones, as
  suggested by Sean.  Keep the mn_ prefix.
- Fix incorrect HVA range computation in pfncache invalidation for
  guest_memfd and memory attribute update paths.  gfn_to_hva_memslot()
  with gfn_end == slot->base_gfn + slot->npages triggers
  array_index_nospec() clamping, resulting in an empty range.  Use
  hva_start + (gfn_end - gfn_start) * PAGE_SIZE instead.
- Add selftests that exercise pfncache with guest_memfd-backed memory
  (NO_DIRECT_MAP and SW_PROTECTED_VM) and verify invalidation paths
  (punch_hole, private-to-shared conversion, file release).

Changes since RFC v2:
- Drop avoidance of silent kvm-clock activation failure.
- Fix a compile error for kvm_for_each_memslot().

Changes since RFC v1:
- Prevent kvm-clock activation from failing silently.
- Generalize serialization mechanism for invalidation.
- Hook pfncache invalidation into guest_memfd invalidation and memory
  attribute updates.

RFC v3: https://lore.kernel.org/all/20260310063647.15665-1-itazur@amazon.com/
RFC v2: https://lore.kernel.org/all/20260226135309.29493-1-itazur@amazon.com/
RFC v1: https://lore.kernel.org/all/20251203144159.6131-1-itazur@amazon.com/

[1]: https://lore.kernel.org/all/20260126164445.11867-1-kalyazin@amazon.com/

Takahiro Itazuri (7):
  KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs
  KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP
  KVM: Rename invalidate_begin to invalidate_start for consistency
  KVM: pfncache: Rename invalidate_start() helper
  KVM: pfncache: Invalidate on gmem invalidation and memattr updates
  KVM: selftests: Test pfncache with gmem-backed memory
  KVM: selftests: Test pfncache invalidation for gmem-backed memory

 arch/x86/kvm/mmu/mmu.c                        |   2 +-
 include/linux/kvm_host.h                      |   2 +-
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/x86/pfncache_gmem_test.c    | 222 ++++++++++++++++++
 virt/kvm/guest_memfd.c                        |  64 ++++-
 virt/kvm/kvm_main.c                           |  55 ++++-
 virt/kvm/kvm_mm.h                             |  12 +-
 virt/kvm/pfncache.c                           | 110 +++++++--
 8 files changed, 427 insertions(+), 41 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86/pfncache_gmem_test.c

-- 
2.50.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH v4 1/7] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs
  2026-04-20 15:46 [RFC PATCH v4 0/7] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
@ 2026-04-20 15:46 ` Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 2/7] KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP Takahiro Itazuri
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Takahiro Itazuri @ 2026-04-20 15:46 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Patrick Roy, Derek Manwaring, Alina Cernea, Michael Zoumboulakis,
	Takahiro Itazuri, Takahiro Itazuri

Currently, pfncaches always resolve PFNs via hva_to_pfn(), which
requires a userspace mapping and relies on GUP.  This does not work for
guest_memfd in the following two ways:

  * guest_memfd created without MMAP flag does not have a userspace
    mapping.

  * guest_memfd created with NO_DIRECT_MAP flag uses an AS_NO_DIRECT_MAP
    mapping, which is rejected by GUP.

Resolve PFNs via kvm_gmem_get_pfn() for guest_memfd-backed and GPA-based
pfncaches.  Otherwise, fall back to the existing hva_to_pfn().

The current implementation does not support HVA-based pfncaches for
NO_DIRECT_MAP guest_memfd.  HVA-based pfncaches do not store
memslot/GPA context, so they cannot determine whether the target is
guest_memfd-backed and always fall back to hva_to_pfn().  Adding a
memslot/GPA lookup is possibile but would add overhead to all HVA-based
pfncache activations and refreshes.  At the time of writing, only Xen
uses HVA-based pfncaches.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 virt/kvm/pfncache.c | 66 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 54 insertions(+), 12 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 728d2c1b488a..ad41cf3e8df4 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -152,7 +152,53 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
 	return kvm->mmu_invalidate_seq != mmu_seq;
 }
 
-static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
+/*
+ * Determine whether a GPA-based pfncache is backed by guest_memfd, i.e. needs
+ * to be resolved via kvm_gmem_get_pfn() rather than GUP.
+ *
+ * The caller holds gpc->refresh_lock, but does not hold gpc->lock nor
+ * kvm->slots_lock.  Reading slot->flags (via kvm_slot_has_gmem() and
+ * kvm_memslot_is_gmem_only()) is safe because memslot changes bump
+ * slots->generation, which is detected in kvm_gpc_check(), forcing callers
+ * to invoke kvm_gpc_refresh().
+ *
+ * Looking up memory attributes (via kvm_mem_is_private()) can race with
+ * KVM_SET_MEMORY_ATTRIBUTES, which takes kvm->slots_lock to serialize
+ * writers but doesn't exclude lockless readers.  Handling that race is deferred
+ * to a subsequent commit that wires up pfncache invalidation for gmem events.
+ */
+static inline bool gpc_is_gmem_backed(struct gfn_to_pfn_cache *gpc)
+{
+	lockdep_assert_held(&gpc->refresh_lock);
+
+	/* For HVA-based pfncaches, memslot is NULL */
+	return gpc->memslot && kvm_slot_has_gmem(gpc->memslot) &&
+	       (kvm_memslot_is_gmem_only(gpc->memslot) ||
+		kvm_mem_is_private(gpc->kvm, gpa_to_gfn(gpc->gpa)));
+}
+
+static kvm_pfn_t gpc_to_pfn(struct gfn_to_pfn_cache *gpc, struct page **page)
+{
+	if (gpc_is_gmem_backed(gpc)) {
+		kvm_pfn_t pfn;
+
+		if (kvm_gmem_get_pfn(gpc->kvm, gpc->memslot,
+				     gpa_to_gfn(gpc->gpa), &pfn, page, NULL))
+			return KVM_PFN_ERR_FAULT;
+
+		return pfn;
+	}
+
+	return hva_to_pfn(&(struct kvm_follow_pfn) {
+		.slot = gpc->memslot,
+		.gfn = gpa_to_gfn(gpc->gpa),
+		.flags = FOLL_WRITE,
+		.hva = gpc->uhva,
+		.refcounted_page = page,
+	});
+}
+
+static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 {
 	/* Note, the new page offset may be different than the old! */
 	void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
@@ -161,14 +207,6 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 	unsigned long mmu_seq;
 	struct page *page;
 
-	struct kvm_follow_pfn kfp = {
-		.slot = gpc->memslot,
-		.gfn = gpa_to_gfn(gpc->gpa),
-		.flags = FOLL_WRITE,
-		.hva = gpc->uhva,
-		.refcounted_page = &page,
-	};
-
 	lockdep_assert_held(&gpc->refresh_lock);
 
 	lockdep_assert_held_write(&gpc->lock);
@@ -206,7 +244,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 			cond_resched();
 		}
 
-		new_pfn = hva_to_pfn(&kfp);
+		new_pfn = gpc_to_pfn(gpc, &page);
 		if (is_error_noslot_pfn(new_pfn))
 			goto out_error;
 
@@ -319,7 +357,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 		}
 	}
 
-	/* Note: the offset must be correct before calling hva_to_pfn_retry() */
+	/* Note: the offset must be correct before calling gpc_to_pfn_retry() */
 	gpc->uhva += page_offset;
 
 	/*
@@ -327,7 +365,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 	 * drop the lock and do the HVA to PFN lookup again.
 	 */
 	if (!gpc->valid || hva_change) {
-		ret = hva_to_pfn_retry(gpc);
+		ret = gpc_to_pfn_retry(gpc);
 	} else {
 		/*
 		 * If the HVA→PFN mapping was already valid, don't unmap it.
@@ -441,6 +479,10 @@ int kvm_gpc_activate_hva(struct gfn_to_pfn_cache *gpc, unsigned long uhva, unsig
 	if (!access_ok((void __user *)uhva, len))
 		return -EINVAL;
 
+	/*
+	 * HVA-based caches always resolve PFNs via GUP (hva_to_pfn()), which
+	 * does not work for NO_DIRECT_MAP guest_memfd.
+	 */
 	return __kvm_gpc_activate(gpc, INVALID_GPA, uhva, len);
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v4 2/7] KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP
  2026-04-20 15:46 [RFC PATCH v4 0/7] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 1/7] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs Takahiro Itazuri
@ 2026-04-20 15:46 ` Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 3/7] KVM: Rename invalidate_begin to invalidate_start for consistency Takahiro Itazuri
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Takahiro Itazuri @ 2026-04-20 15:46 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Patrick Roy, Derek Manwaring, Alina Cernea, Michael Zoumboulakis,
	Takahiro Itazuri, Takahiro Itazuri

Currently, pfncaches map RAM pages via kmap(), which typically returns a
kernel address derived from the direct map.  However, guest_memfd
created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP has their direct map removed
and uses an AS_NO_DIRECT_MAP mapping.  So kmap() cannot be used in this
case.

pfncaches can be used from atomic context where page faults cannot be
tolerated.  Therefore, it cannot fall back to access via a userspace
mapping like KVM does for other accesses to NO_DIRECT_MAP guest_memfd.

To obtain a fault-free kernel host virtual address (KHVA), use vmap()
for NO_DIRECT_MAP pages.  Since gpc_map() is the sole producer of KHVA
for pfncaches and only vmap() returns a vmalloc address, gpc_unmap()
can reliably pair vunmap() using is_vmalloc_addr().

Although vm_map_ram() could be faster than vmap(), mixing short-lived
and long-lived vm_map_ram() can lead to fragmentation.  For this reason,
vm_map_ram() is recommended only for short-lived ones.  Since pfncaches
typically have a lifetime comparable to that of the VM, vm_map_ram() is
deliberately not used here.

pfncaches are not dynamically allocated but are statically allocated on
a per-VM and per-vCPU basis.  For a normal VM (i.e. non-Xen), there is
one pfncache per vCPU.  For a Xen VM, there is one per-VM pfncache and
five per-vCPU pfncaches.  Given the maximum of 1024 vCPUs, a normal VM
can have up to 1024 pfncaches, consuming 4 MB of virtual address space.
A Xen VM can have up to 5121 pfncaches, consuming approximately 20 MB of
virtual address space.  Although the vmalloc area is limited on 32-bit
systems, it should be large enough and typically tens of TB on 64-bit
systems (e.g. 32 TB for 4-level paging and 12800 TB for 5-level paging
on x86_64).  If virtual address space exhaustion becomes a concern,
migration to an mm-local region could be considered in the future.  Note
that vmap() and vm_map_ram() only create virtual mappings to existing
pages; they do not allocate new physical pages.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 virt/kvm/pfncache.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index ad41cf3e8df4..682dc3ba2216 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -16,6 +16,7 @@
 #include <linux/highmem.h>
 #include <linux/module.h>
 #include <linux/errno.h>
+#include <linux/pagemap.h>

 #include "kvm_mm.h"

@@ -98,8 +99,19 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)

 static void *gpc_map(kvm_pfn_t pfn)
 {
-	if (pfn_valid(pfn))
-		return kmap(pfn_to_page(pfn));
+	if (pfn_valid(pfn)) {
+		struct page *page = pfn_to_page(pfn);
+		struct page *head = compound_head(page);
+		struct address_space *mapping = READ_ONCE(head->mapping);
+
+		if (mapping && mapping_no_direct_map(mapping)) {
+			struct page *pages[] = { page };
+
+			return vmap(pages, 1, VM_MAP, PAGE_KERNEL);
+		}
+
+		return kmap(page);
+	}

 #ifdef CONFIG_HAS_IOMEM
 	return memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
@@ -115,7 +127,15 @@ static void gpc_unmap(kvm_pfn_t pfn, void *khva)
 		return;

 	if (pfn_valid(pfn)) {
-		kunmap(pfn_to_page(pfn));
+		/*
+		 * For valid PFNs, gpc_map() returns either a kmap() address
+		 * (non-vmalloc) or a vmap() address (vmalloc).
+		 */
+		if (is_vmalloc_addr(khva))
+			vunmap(khva);
+		else
+			kunmap(pfn_to_page(pfn));
+
 		return;
 	}

@@ -250,8 +270,11 @@ static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc)

 		/*
 		 * Obtain a new kernel mapping if KVM itself will access the
-		 * pfn.  Note, kmap() and memremap() can both sleep, so this
-		 * too must be done outside of gpc->lock!
+		 * pfn.  Note, kmap(), vmap() and memremap() can all sleep, so
+		 * this too must be done outside of gpc->lock!
+		 * Note that even though gpc->lock is dropped, it's still fine
+		 * to read gpc->pfn and other fields because gpc->refresh_lock
+		 * mutex prevents them from being updated.
 		 */
 		if (new_pfn == gpc->pfn)
 			new_khva = old_khva;
-- 
2.50.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v4 3/7] KVM: Rename invalidate_begin to invalidate_start for consistency
  2026-04-20 15:46 [RFC PATCH v4 0/7] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 1/7] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 2/7] KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP Takahiro Itazuri
@ 2026-04-20 15:46 ` Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 4/7] KVM: pfncache: Rename invalidate_start() helper Takahiro Itazuri
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Takahiro Itazuri @ 2026-04-20 15:46 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Patrick Roy, Derek Manwaring, Alina Cernea, Michael Zoumboulakis,
	Takahiro Itazuri, Takahiro Itazuri

Rename kvm_mmu_invalidate_begin() to kvm_mmu_invalidate_start() to
align with mmu_notifier_ops.invalidate_range_start(), which is the
callback that ultimately drives KVM's MMU invalidation.

While the naming within KVM itself is a close split between "_begin" and
"_start", conforming to the mmu_notifier_ops naming is the right call
since invalidate_range_start() is the external API that KVM hooks into.

  $ git grep -E "invalidate(_range)?_begin"  **/kvm | wc -l
  11
  $ git grep -E "invalidate(_range)?_start"  **/kvm | wc -l
  16

No functional change intended.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 arch/x86/kvm/mmu/mmu.c   |  2 +-
 include/linux/kvm_host.h |  2 +-
 virt/kvm/guest_memfd.c   | 14 +++++++-------
 virt/kvm/kvm_main.c      |  6 +++---
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d3e705ac4c6f..e82a357e2219 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6859,7 +6859,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 
 	write_lock(&kvm->mmu_lock);
 
-	kvm_mmu_invalidate_begin(kvm);
+	kvm_mmu_invalidate_start(kvm);
 
 	kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2ea5d2f172f7..618a71894ed1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1566,7 +1566,7 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
 void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 #endif
 
-void kvm_mmu_invalidate_begin(struct kvm *kvm);
+void kvm_mmu_invalidate_start(struct kvm *kvm);
 void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
 void kvm_mmu_invalidate_end(struct kvm *kvm);
 bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 5d6e966d4f32..79f34dad0c2f 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -206,7 +206,7 @@ static enum kvm_gfn_range_filter kvm_gmem_get_invalidate_filter(struct inode *in
 	return KVM_FILTER_PRIVATE;
 }
 
-static void __kvm_gmem_invalidate_begin(struct gmem_file *f, pgoff_t start,
+static void __kvm_gmem_invalidate_start(struct gmem_file *f, pgoff_t start,
 					pgoff_t end,
 					enum kvm_gfn_range_filter attr_filter)
 {
@@ -230,7 +230,7 @@ static void __kvm_gmem_invalidate_begin(struct gmem_file *f, pgoff_t start,
 			found_memslot = true;
 
 			KVM_MMU_LOCK(kvm);
-			kvm_mmu_invalidate_begin(kvm);
+			kvm_mmu_invalidate_start(kvm);
 		}
 
 		flush |= kvm_mmu_unmap_gfn_range(kvm, &gfn_range);
@@ -243,7 +243,7 @@ static void __kvm_gmem_invalidate_begin(struct gmem_file *f, pgoff_t start,
 		KVM_MMU_UNLOCK(kvm);
 }
 
-static void kvm_gmem_invalidate_begin(struct inode *inode, pgoff_t start,
+static void kvm_gmem_invalidate_start(struct inode *inode, pgoff_t start,
 				      pgoff_t end)
 {
 	enum kvm_gfn_range_filter attr_filter;
@@ -252,7 +252,7 @@ static void kvm_gmem_invalidate_begin(struct inode *inode, pgoff_t start,
 	attr_filter = kvm_gmem_get_invalidate_filter(inode);
 
 	kvm_gmem_for_each_file(f, inode->i_mapping)
-		__kvm_gmem_invalidate_begin(f, start, end, attr_filter);
+		__kvm_gmem_invalidate_start(f, start, end, attr_filter);
 }
 
 static void __kvm_gmem_invalidate_end(struct gmem_file *f, pgoff_t start,
@@ -287,7 +287,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 	 */
 	filemap_invalidate_lock(inode->i_mapping);
 
-	kvm_gmem_invalidate_begin(inode, start, end);
+	kvm_gmem_invalidate_start(inode, start, end);
 
 	truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1);
 
@@ -401,7 +401,7 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
 	 * Zap all SPTEs pointed at by this file.  Do not free the backing
 	 * memory, as its lifetime is associated with the inode, not the file.
 	 */
-	__kvm_gmem_invalidate_begin(f, 0, -1ul,
+	__kvm_gmem_invalidate_start(f, 0, -1ul,
 				    kvm_gmem_get_invalidate_filter(inode));
 	__kvm_gmem_invalidate_end(f, 0, -1ul);
 
@@ -582,7 +582,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
 	start = folio->index;
 	end = start + folio_nr_pages(folio);
 
-	kvm_gmem_invalidate_begin(mapping->host, start, end);
+	kvm_gmem_invalidate_start(mapping->host, start, end);
 
 	/*
 	 * Do not truncate the range, what action is taken in response to the
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 60a8b7ca8ab4..5871882ff1db 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -678,7 +678,7 @@ static __always_inline int kvm_age_hva_range_no_flush(struct mmu_notifier *mn,
 	return kvm_age_hva_range(mn, start, end, handler, false);
 }
 
-void kvm_mmu_invalidate_begin(struct kvm *kvm)
+void kvm_mmu_invalidate_start(struct kvm *kvm)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 	/*
@@ -734,7 +734,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 		.start		= range->start,
 		.end		= range->end,
 		.handler	= kvm_mmu_unmap_gfn_range,
-		.on_lock	= kvm_mmu_invalidate_begin,
+		.on_lock	= kvm_mmu_invalidate_start,
 		.flush_on_ret	= true,
 		.may_block	= mmu_notifier_range_blockable(range),
 	};
@@ -2571,7 +2571,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 		.end = end,
 		.arg.attributes = attributes,
 		.handler = kvm_pre_set_memory_attributes,
-		.on_lock = kvm_mmu_invalidate_begin,
+		.on_lock = kvm_mmu_invalidate_start,
 		.flush_on_ret = true,
 		.may_block = true,
 	};
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v4 4/7] KVM: pfncache: Rename invalidate_start() helper
  2026-04-20 15:46 [RFC PATCH v4 0/7] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
                   ` (2 preceding siblings ...)
  2026-04-20 15:46 ` [RFC PATCH v4 3/7] KVM: Rename invalidate_begin to invalidate_start for consistency Takahiro Itazuri
@ 2026-04-20 15:46 ` Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 5/7] KVM: pfncache: Invalidate on gmem invalidation and memattr updates Takahiro Itazuri
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Takahiro Itazuri @ 2026-04-20 15:46 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Patrick Roy, Derek Manwaring, Alina Cernea, Michael Zoumboulakis,
	Takahiro Itazuri, Takahiro Itazuri

Rename gfn_to_pfn_cache_invalidate_start() to
gpc_invalidate_hva_range_start() to explicitly indicate that it takes a
range of HVA range.

No functional changes intended.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 virt/kvm/kvm_main.c |  2 +-
 virt/kvm/kvm_mm.h   | 12 ++++++------
 virt/kvm/pfncache.c |  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5871882ff1db..d64e70f8e8e3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -763,7 +763,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	 * mn_active_invalidate_count (see above) instead of
 	 * mmu_invalidate_in_progress.
 	 */
-	gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end);
+	gpc_invalidate_hva_range_start(kvm, range->start, range->end);
 
 	/*
 	 * If one or more memslots were found and thus zapped, notify arch code
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 9fcc5d5b7f8d..abd8e7d33ab0 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -56,13 +56,13 @@ struct kvm_follow_pfn {
 kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp);
 
 #ifdef CONFIG_HAVE_KVM_PFNCACHE
-void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
-				       unsigned long start,
-				       unsigned long end);
+void gpc_invalidate_hva_range_start(struct kvm *kvm,
+				    unsigned long start,
+				    unsigned long end);
 #else
-static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
-						     unsigned long start,
-						     unsigned long end)
+static inline void gpc_invalidate_hva_range_start(struct kvm *kvm,
+						  unsigned long start,
+						  unsigned long end)
 {
 }
 #endif /* HAVE_KVM_PFNCACHE */
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 682dc3ba2216..bb4ba3a1b3d9 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -23,8 +23,8 @@
 /*
  * MMU notifier 'invalidate_range_start' hook.
  */
-void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
-				       unsigned long end)
+void gpc_invalidate_hva_range_start(struct kvm *kvm, unsigned long start,
+				    unsigned long end)
 {
 	struct gfn_to_pfn_cache *gpc;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v4 5/7] KVM: pfncache: Invalidate on gmem invalidation and memattr updates
  2026-04-20 15:46 [RFC PATCH v4 0/7] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
                   ` (3 preceding siblings ...)
  2026-04-20 15:46 ` [RFC PATCH v4 4/7] KVM: pfncache: Rename invalidate_start() helper Takahiro Itazuri
@ 2026-04-20 15:46 ` Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 6/7] KVM: selftests: Test pfncache with gmem-backed memory Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 7/7] KVM: selftests: Test pfncache invalidation for " Takahiro Itazuri
  6 siblings, 0 replies; 8+ messages in thread
From: Takahiro Itazuri @ 2026-04-20 15:46 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Patrick Roy, Derek Manwaring, Alina Cernea, Michael Zoumboulakis,
	Takahiro Itazuri, Takahiro Itazuri

Invalidate pfncaches when guest_memfd invalidation or memory attribute
updates render cached PFN resolutions stale.

Reuse mn_active_invalidate_count to synchronize with the existing retry
logic and preserve ordering against mmu_invalidate_seq.

Invalidation needs to be performed using HVA ranges so that both
GPA-based and HVA-based pfncaches are covered.  Internally GPA-based
ones translate GPA to memslot/UHVA first and then resolve PFN, while
HVA-based ones only resolve PFN and do not store memslot/GPA context.
Technically, it is possible to make HVA-based pfncaches search the
corresponding memslot/GPA when activated/refreshed, but it would add
overhead to a greater or lesser extent, regardless of guest_memfd-backed
or not.  At the time of writing, only Xen uses HVA-based pfncaches.

Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 virt/kvm/guest_memfd.c | 50 ++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c    | 47 ++++++++++++++++++++++++++++++++++++++-
 virt/kvm/pfncache.c    | 21 ++++++++++--------
 3 files changed, 108 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 79f34dad0c2f..011fd205ac7e 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -215,6 +215,33 @@ static void __kvm_gmem_invalidate_start(struct gmem_file *f, pgoff_t start,
 	struct kvm *kvm = f->kvm;
 	unsigned long index;
 
+	/*
+	 * Prevent pfncaches from being activated / refreshed using stale PFN
+	 * resolutions.  To invalidate pfncaches _before_ invalidating the
+	 * secondary MMUs (i.e. without acquiring mmu_lock), pfncaches must use
+	 * mn_active_invalidate_count instead of mmu_invalidate_in_progress.
+	 */
+	spin_lock(&kvm->mn_invalidate_lock);
+	kvm->mn_active_invalidate_count++;
+	spin_unlock(&kvm->mn_invalidate_lock);
+
+	/*
+	 * Invalidation of pfncaches must be done using a HVA range.  pfncaches
+	 * can be either GPA-based or HVA-based, and all pfncaches store uhva
+	 * while HVA-based pfncaches do not have gpa/memslot context.  Thus,
+	 * using GFN ranges would miss invalidating HVA-based ones.
+	 */
+	xa_for_each_range(&f->bindings, index, slot, start, end - 1) {
+		pgoff_t pgoff = slot->gmem.pgoff;
+		gfn_t gfn_start = slot->base_gfn + max(pgoff, start) - pgoff;
+		gfn_t gfn_end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff;
+
+		unsigned long hva_start = gfn_to_hva_memslot(slot, gfn_start);
+		unsigned long hva_end = hva_start + (gfn_end - gfn_start) * PAGE_SIZE;
+
+		gpc_invalidate_hva_range_start(kvm, hva_start, hva_end);
+	}
+
 	xa_for_each_range(&f->bindings, index, slot, start, end - 1) {
 		pgoff_t pgoff = slot->gmem.pgoff;
 
@@ -259,12 +286,35 @@ static void __kvm_gmem_invalidate_end(struct gmem_file *f, pgoff_t start,
 				      pgoff_t end)
 {
 	struct kvm *kvm = f->kvm;
+	bool wake;
 
 	if (xa_find(&f->bindings, &start, end - 1, XA_PRESENT)) {
 		KVM_MMU_LOCK(kvm);
 		kvm_mmu_invalidate_end(kvm);
 		KVM_MMU_UNLOCK(kvm);
 	}
+
+	/*
+	 * This must be done after the increment of mmu_invalidate_seq and
+	 * smp_wmb() in kvm_mmu_invalidate_end() to guarantee that
+	 * gpc_invalidate_retry() observes either the old (non-zero)
+	 * mn_active_invalidate_count or the new (incremented) mmu_invalidate_seq.
+	 */
+	spin_lock(&kvm->mn_invalidate_lock);
+	if (!WARN_ON_ONCE(!kvm->mn_active_invalidate_count))
+		kvm->mn_active_invalidate_count--;
+	wake = !kvm->mn_active_invalidate_count;
+	spin_unlock(&kvm->mn_invalidate_lock);
+
+	/*
+	 * guest_memfd invalidation itself doesn't need to block active memslots
+	 * swap as bindings updates are serialized by filemap_invalidate_lock().
+	 * However, mn_active_invalidate_count is shared with the MMU notifier
+	 * path, so the waiter must be woken when mn_active_invalidate_count
+	 * drops to zero.
+	 */
+	if (wake)
+		rcuwait_wake_up(&kvm->mn_memslots_update_rcuwait);
 }
 
 static void kvm_gmem_invalidate_end(struct inode *inode, pgoff_t start,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d64e70f8e8e3..b6d0a22fee79 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2583,9 +2583,11 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 		.on_lock = kvm_mmu_invalidate_end,
 		.may_block = true,
 	};
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *slot;
 	unsigned long i;
 	void *entry;
-	int r = 0;
+	int r = 0, bkt;
 
 	entry = attributes ? xa_mk_value(attributes) : NULL;
 
@@ -2609,6 +2611,34 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 		cond_resched();
 	}
 
+	/*
+	 * Prevent pfncaches from being activated / refreshed using stale PFN
+	 * resolutions.  To invalidate pfncaches _before_ invalidating the
+	 * secondary MMUs (i.e. without acquiring mmu_lock), pfncaches must use
+	 * mn_active_invalidate_count instead of mmu_invalidate_in_progress.
+	 */
+	spin_lock(&kvm->mn_invalidate_lock);
+	kvm->mn_active_invalidate_count++;
+	spin_unlock(&kvm->mn_invalidate_lock);
+
+	/*
+	 * Invalidation of pfncaches must be done using a HVA range.  pfncaches
+	 * can be either GPA-based or HVA-based, and all pfncaches store uhva
+	 * while HVA-based pfncaches do not have gpa/memslot info.  Thus,
+	 * using GFN ranges would miss invalidating HVA-based ones.
+	 */
+	kvm_for_each_memslot(slot, bkt, slots) {
+		gfn_t gfn_start = max(start, slot->base_gfn);
+		gfn_t gfn_end = min(end, slot->base_gfn + slot->npages);
+
+		if (gfn_start < gfn_end) {
+			unsigned long hva_start = gfn_to_hva_memslot(slot, gfn_start);
+			unsigned long hva_end = hva_start + (gfn_end - gfn_start) * PAGE_SIZE;
+
+			gpc_invalidate_hva_range_start(kvm, hva_start, hva_end);
+		}
+	}
+
 	kvm_handle_gfn_range(kvm, &pre_set_range);
 
 	for (i = start; i < end; i++) {
@@ -2620,6 +2650,21 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 
 	kvm_handle_gfn_range(kvm, &post_set_range);
 
+	/*
+	 * This must be done after the increment of mmu_invalidate_seq and
+	 * smp_wmb() in kvm_mmu_invalidate_end() to guarantee that
+	 * gpc_invalidate_retry() observes either the old (non-zero)
+	 * mn_active_invalidate_count or the new (incremented) mmu_invalidate_seq.
+	 *
+	 * mn_memslots_update_rcuwait does not need to be waked when
+	 * mn_active_invalidate_count drops to zero because active memslots swap
+	 * is also done while holding slots_lock.
+	 */
+	spin_lock(&kvm->mn_invalidate_lock);
+	if (!WARN_ON_ONCE(!kvm->mn_active_invalidate_count))
+		kvm->mn_active_invalidate_count--;
+	spin_unlock(&kvm->mn_invalidate_lock);
+
 out_unlock:
 	mutex_unlock(&kvm->slots_lock);
 
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index bb4ba3a1b3d9..0e7a0f64e14b 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -144,7 +144,7 @@ static void gpc_unmap(kvm_pfn_t pfn, void *khva)
 #endif
 }
 
-static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_seq)
+static inline bool gpc_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq)
 {
 	/*
 	 * mn_active_invalidate_count acts for all intents and purposes
@@ -178,14 +178,17 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
  *
  * The caller holds gpc->refresh_lock, but does not hold gpc->lock nor
  * kvm->slots_lock.  Reading slot->flags (via kvm_slot_has_gmem() and
- * kvm_memslot_is_gmem_only()) is safe because memslot changes bump
- * slots->generation, which is detected in kvm_gpc_check(), forcing callers
- * to invoke kvm_gpc_refresh().
+ * kvm_memslot_is_gmem_only()) and looking up memory attributes (via
+ * kvm_mem_is_private()) without those locks is safe because:
  *
- * Looking up memory attributes (via kvm_mem_is_private()) can race with
- * KVM_SET_MEMORY_ATTRIBUTES, which takes kvm->slots_lock to serialize
- * writers but doesn't exclude lockless readers.  Handling that race is deferred
- * to a subsequent commit that wires up pfncache invalidation for gmem events.
+ * - memslot changes bump slots->generation, which is detected in
+ *   kvm_gpc_check(), forcing callers to invoke kvm_gpc_refresh().
+ *
+ * - Memory attribute changes and gmem invalidations elevate
+ *   mn_active_invalidate_count and bump mmu_invalidate_seq, bracketing the
+ *   pfncache invalidation.  gpc_invalidate_retry() observes either of these
+ *   changes and forces a retry of the refresh loop in gpc_to_pfn_retry(), so
+ *   any stale value read here will be re-evaluated.
  */
 static inline bool gpc_is_gmem_backed(struct gfn_to_pfn_cache *gpc)
 {
@@ -293,7 +296,7 @@ static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 		 * attempting to refresh.
 		 */
 		WARN_ON_ONCE(gpc->valid);
-	} while (mmu_notifier_retry_cache(gpc->kvm, mmu_seq));
+	} while (gpc_invalidate_retry(gpc->kvm, mmu_seq));
 
 	gpc->valid = true;
 	gpc->pfn = new_pfn;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v4 6/7] KVM: selftests: Test pfncache with gmem-backed memory
  2026-04-20 15:46 [RFC PATCH v4 0/7] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
                   ` (4 preceding siblings ...)
  2026-04-20 15:46 ` [RFC PATCH v4 5/7] KVM: pfncache: Invalidate on gmem invalidation and memattr updates Takahiro Itazuri
@ 2026-04-20 15:46 ` Takahiro Itazuri
  2026-04-20 15:46 ` [RFC PATCH v4 7/7] KVM: selftests: Test pfncache invalidation for " Takahiro Itazuri
  6 siblings, 0 replies; 8+ messages in thread
From: Takahiro Itazuri @ 2026-04-20 15:46 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Patrick Roy, Derek Manwaring, Alina Cernea, Michael Zoumboulakis,
	Takahiro Itazuri, Takahiro Itazuri

Add a selftest that exercises pfncache (gfn_to_pfn_cache) with
guest_memfd-backed memory by using kvm-clock as the test vehicle.

The test creates two VM configurations:

 - NO_DIRECT_MAP VM: All memory is gmem-backed (MMAP | INIT_SHARED |
   NO_DIRECT_MAP). KVM_MEMSLOT_GMEM_ONLY is set, so pfncache resolves
   PFNs via kvm_gmem_get_pfn() and maps KHVAs via vmap().

 - SW_PROTECTED_VM: Memory starts private. pfncache resolves PFNs via
   kvm_gmem_get_pfn() for private pages. This validates the private
   memory pfncache path for future extensibility (e.g. pKVM-like VMs).

The guest writes MSR_KVM_SYSTEM_TIME_NEW (triggering kvm_gpc_activate()
internally), reads the pvclock structure to verify KVM wrote through the
pfncache KHVA correctly, and reports the kvm-clock value to the host for
bounds checking against KVM_GET_CLOCK.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/x86/pfncache_gmem_test.c    | 193 ++++++++++++++++++
 2 files changed, 194 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86/pfncache_gmem_test.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 148d427ff24b..faf454d64e4e 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -92,6 +92,7 @@ TEST_GEN_PROGS_x86 += x86/nested_emulation_test
 TEST_GEN_PROGS_x86 += x86/nested_exceptions_test
 TEST_GEN_PROGS_x86 += x86/platform_info_test
 TEST_GEN_PROGS_x86 += x86/pmu_counters_test
+TEST_GEN_PROGS_x86 += x86/pfncache_gmem_test
 TEST_GEN_PROGS_x86 += x86/pmu_event_filter_test
 TEST_GEN_PROGS_x86 += x86/private_mem_conversions_test
 TEST_GEN_PROGS_x86 += x86/private_mem_kvm_exits_test
diff --git a/tools/testing/selftests/kvm/x86/pfncache_gmem_test.c b/tools/testing/selftests/kvm/x86/pfncache_gmem_test.c
new file mode 100644
index 000000000000..c61b161f3e0c
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/pfncache_gmem_test.c
@@ -0,0 +1,193 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025, Amazon.com, Inc. or its affiliates.
+ *
+ * Test pfncache (gfn_to_pfn_cache) with guest_memfd-backed memory.
+ *
+ * Exercises pfncache indirectly through kvm-clock: the guest writes
+ * MSR_KVM_SYSTEM_TIME_NEW (triggering kvm_gpc_activate() in KVM), KVM writes
+ * pvclock data through the pfncache's KHVA, and the guest reads the
+ * pvclock_vcpu_time_info structure to verify correctness.
+ *
+ * Two VM configurations exercise distinct pfncache code paths:
+ *
+ *  - NO_DIRECT_MAP VM: All memory is gmem-backed and shared (MMAP |
+ *    INIT_SHARED | NO_DIRECT_MAP).  KVM_MEMSLOT_GMEM_ONLY is set, so
+ *    gpc_is_gmem_backed() always returns true.  PFN resolution goes through
+ *    kvm_gmem_get_pfn() and KHVA mapping uses vmap().
+ *
+ *  - SW_PROTECTED_VM: Memory starts private.  pfncache uses
+ *    kvm_gmem_get_pfn() for private pages.  This validates the private
+ *    memory pfncache path for future extensibility (e.g. pKVM-like VMs).
+ */
+#include <asm/kvm_para.h>
+#include <asm/pvclock.h>
+#include <asm/pvclock-abi.h>
+#include <stdint.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define GUEST_SYNC_CLOCK(__stage, __val)	\
+	GUEST_SYNC_ARGS(__stage, __val, 0, 0, 0)
+
+static void guest_main(vm_paddr_t pvti_pa, struct pvclock_vcpu_time_info *pvti)
+{
+	int stage = 0;
+
+	wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED);
+
+	for (;;) {
+		uint64_t clock;
+
+		GUEST_ASSERT(pvti->system_time != 0);
+		clock = __pvclock_read_cycles(pvti, rdtsc());
+		GUEST_SYNC_CLOCK(stage++, clock);
+	}
+}
+
+static uint64_t run_and_verify_kvm_clock(struct kvm_vcpu *vcpu,
+					 uint64_t prev_clock)
+{
+	struct kvm_clock_data start, end;
+	struct ucall uc;
+	uint64_t guest_clock;
+
+	vm_ioctl(vcpu->vm, KVM_GET_CLOCK, &start);
+	vcpu_run(vcpu);
+	vm_ioctl(vcpu->vm, KVM_GET_CLOCK, &end);
+
+	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+	switch (get_ucall(vcpu, &uc)) {
+	case UCALL_SYNC:
+		break;
+	case UCALL_ABORT:
+		REPORT_GUEST_ASSERT(uc);
+		/* unreachable */
+		return 0;
+	default:
+		TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
+	}
+
+	guest_clock = uc.args[2];
+
+	TEST_ASSERT(start.clock <= guest_clock && guest_clock <= end.clock,
+		    "guest clock %llu ns not in expected range [%llu, %llu] ns",
+		    (unsigned long long)guest_clock,
+		    (unsigned long long)start.clock,
+		    (unsigned long long)end.clock);
+
+	if (prev_clock)
+		TEST_ASSERT(guest_clock > prev_clock,
+			    "guest clock %llu ns not monotonic (prev %llu ns)",
+			    (unsigned long long)guest_clock,
+			    (unsigned long long)prev_clock);
+
+	pr_info("  guest clock %llu ns, expected range [%llu, %llu] ns\n",
+		(unsigned long long)guest_clock,
+		(unsigned long long)start.clock,
+		(unsigned long long)end.clock);
+
+	return guest_clock;
+}
+
+#define PVCLOCK_SLOT		10
+#define PVCLOCK_GPA		(1ULL << 32)
+
+static struct kvm_vm *setup_vm(struct vm_shape shape,
+			       struct kvm_vcpu **vcpu_out)
+{
+	struct kvm_vm *vm;
+
+	vm = __vm_create_shape_with_one_vcpu(shape, vcpu_out, 0, guest_main);
+
+	/*
+	 * For SW_PROTECTED_VM, the primary memslot doesn't have guest_memfd.
+	 * Place the pvclock page in a separate memslot with both anonymous
+	 * memory (for shared) and guest_memfd (for private), and mark it
+	 * private so that pfncache exercises the kvm_gmem_get_pfn() path.
+	 */
+	if (shape.type == KVM_X86_SW_PROTECTED_VM) {
+		int memfd = vm_create_guest_memfd(vm, getpagesize(), 0);
+
+		vm_mem_add(vm, VM_MEM_SRC_ANONYMOUS, PVCLOCK_GPA,
+			   PVCLOCK_SLOT, 1, KVM_MEM_GUEST_MEMFD, memfd, 0);
+		virt_map(vm, PVCLOCK_GPA, PVCLOCK_GPA, 1);
+		vcpu_args_set(*vcpu_out, 2, (vm_paddr_t)PVCLOCK_GPA,
+			      (struct pvclock_vcpu_time_info *)PVCLOCK_GPA);
+		vm_mem_set_private(vm, PVCLOCK_GPA, getpagesize());
+	} else {
+		vm_vaddr_t pvti_gva;
+		vm_paddr_t pvti_gpa;
+
+		pvti_gva = vm_vaddr_alloc(vm, getpagesize(), 0x10000);
+		pvti_gpa = addr_gva2gpa(vm, pvti_gva);
+		vcpu_args_set(*vcpu_out, 2, pvti_gpa, pvti_gva);
+	}
+
+	return vm;
+}
+
+static void test_no_direct_map(void)
+{
+	struct vm_shape shape = {
+		.mode = VM_MODE_DEFAULT,
+		.type = VM_TYPE_DEFAULT,
+		.src_type = VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP,
+	};
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	uint64_t clock = 0;
+
+	pr_info("Testing pfncache with NO_DIRECT_MAP guest_memfd\n");
+
+	vm = setup_vm(shape, &vcpu);
+
+	/* Verify kvm-clock works with gmem-backed pfncache (vmap KHVA) */
+	clock = run_and_verify_kvm_clock(vcpu, clock);
+	clock = run_and_verify_kvm_clock(vcpu, clock);
+
+	kvm_vm_free(vm);
+}
+
+static void test_sw_protected_vm(void)
+{
+	struct vm_shape shape = {
+		.mode = VM_MODE_DEFAULT,
+		.type = KVM_X86_SW_PROTECTED_VM,
+	};
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	uint64_t clock = 0;
+
+	pr_info("Testing pfncache with SW_PROTECTED_VM (guest_memfd-backed private memory)\n");
+
+	vm = setup_vm(shape, &vcpu);
+
+	/* Verify kvm-clock works with gmem-backed private memory */
+	clock = run_and_verify_kvm_clock(vcpu, clock);
+	clock = run_and_verify_kvm_clock(vcpu, clock);
+
+	kvm_vm_free(vm);
+}
+
+int main(int argc, char *argv[])
+{
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+	TEST_REQUIRE(sys_clocksource_is_based_on_tsc());
+
+	if (kvm_check_cap(KVM_CAP_GUEST_MEMFD_FLAGS) &
+	    GUEST_MEMFD_FLAG_NO_DIRECT_MAP)
+		test_no_direct_map();
+	else
+		print_skip("GUEST_MEMFD_FLAG_NO_DIRECT_MAP not supported");
+
+	if (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM))
+		test_sw_protected_vm();
+	else
+		print_skip("KVM_X86_SW_PROTECTED_VM not supported");
+
+	return 0;
+}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v4 7/7] KVM: selftests: Test pfncache invalidation for gmem-backed memory
  2026-04-20 15:46 [RFC PATCH v4 0/7] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
                   ` (5 preceding siblings ...)
  2026-04-20 15:46 ` [RFC PATCH v4 6/7] KVM: selftests: Test pfncache with gmem-backed memory Takahiro Itazuri
@ 2026-04-20 15:46 ` Takahiro Itazuri
  6 siblings, 0 replies; 8+ messages in thread
From: Takahiro Itazuri @ 2026-04-20 15:46 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Fuad Tabba, Brendan Jackman, David Hildenbrand,
	David Woodhouse, Paul Durrant, Nikita Kalyazin, Patrick Roy,
	Patrick Roy, Derek Manwaring, Alina Cernea, Michael Zoumboulakis,
	Takahiro Itazuri, Takahiro Itazuri

Extend pfncache_gmem_test to verify pfncache invalidation paths:

 - punch_hole: fallocate(PUNCH_HOLE) on the pvclock page's guest_memfd
   frees the backing page and invalidates the pfncache.  The next
   vcpu_run re-resolves the PFN with a freshly allocated page.

 - file release: kvm_vm_free() closes the guest_memfd fd, triggering a
   full-range pfncache invalidation.

 - private-to-shared conversion: KVM_SET_MEMORY_ATTRIBUTES changes the
   pvclock page from private to shared, invalidating the pfncache.  The
   PFN is re-resolved via GUP instead of kvm_gmem_get_pfn().

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
---
 .../selftests/kvm/x86/pfncache_gmem_test.c    | 39 ++++++++++++++++---
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86/pfncache_gmem_test.c b/tools/testing/selftests/kvm/x86/pfncache_gmem_test.c
index c61b161f3e0c..a63940c36b15 100644
--- a/tools/testing/selftests/kvm/x86/pfncache_gmem_test.c
+++ b/tools/testing/selftests/kvm/x86/pfncache_gmem_test.c
@@ -36,11 +36,10 @@ static void guest_main(vm_paddr_t pvti_pa, struct pvclock_vcpu_time_info *pvti)
 {
 	int stage = 0;
 
-	wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED);
-
 	for (;;) {
 		uint64_t clock;
 
+		wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED);
 		GUEST_ASSERT(pvti->system_time != 0);
 		clock = __pvclock_read_cycles(pvti, rdtsc());
 		GUEST_SYNC_CLOCK(stage++, clock);
@@ -97,7 +96,8 @@ static uint64_t run_and_verify_kvm_clock(struct kvm_vcpu *vcpu,
 #define PVCLOCK_GPA		(1ULL << 32)
 
 static struct kvm_vm *setup_vm(struct vm_shape shape,
-			       struct kvm_vcpu **vcpu_out)
+			       struct kvm_vcpu **vcpu_out,
+			       vm_paddr_t *pvti_gpa_out)
 {
 	struct kvm_vm *vm;
 
@@ -118,6 +118,9 @@ static struct kvm_vm *setup_vm(struct vm_shape shape,
 		vcpu_args_set(*vcpu_out, 2, (vm_paddr_t)PVCLOCK_GPA,
 			      (struct pvclock_vcpu_time_info *)PVCLOCK_GPA);
 		vm_mem_set_private(vm, PVCLOCK_GPA, getpagesize());
+
+		if (pvti_gpa_out)
+			*pvti_gpa_out = PVCLOCK_GPA;
 	} else {
 		vm_vaddr_t pvti_gva;
 		vm_paddr_t pvti_gpa;
@@ -125,6 +128,9 @@ static struct kvm_vm *setup_vm(struct vm_shape shape,
 		pvti_gva = vm_vaddr_alloc(vm, getpagesize(), 0x10000);
 		pvti_gpa = addr_gva2gpa(vm, pvti_gva);
 		vcpu_args_set(*vcpu_out, 2, pvti_gpa, pvti_gva);
+
+		if (pvti_gpa_out)
+			*pvti_gpa_out = pvti_gpa;
 	}
 
 	return vm;
@@ -139,16 +145,27 @@ static void test_no_direct_map(void)
 	};
 	struct kvm_vcpu *vcpu;
 	struct kvm_vm *vm;
+	vm_paddr_t pvti_gpa;
 	uint64_t clock = 0;
 
 	pr_info("Testing pfncache with NO_DIRECT_MAP guest_memfd\n");
 
-	vm = setup_vm(shape, &vcpu);
+	vm = setup_vm(shape, &vcpu, &pvti_gpa);
 
 	/* Verify kvm-clock works with gmem-backed pfncache (vmap KHVA) */
 	clock = run_and_verify_kvm_clock(vcpu, clock);
 	clock = run_and_verify_kvm_clock(vcpu, clock);
 
+	/*
+	 * Punch a hole in the pvclock page's guest_memfd backing.  This
+	 * invalidates the pfncache; the next vcpu_run re-resolves the PFN
+	 * with a freshly allocated page.
+	 */
+	pr_info("  punch_hole on pvclock page\n");
+	vm_guest_mem_punch_hole(vm, pvti_gpa, getpagesize());
+	clock = run_and_verify_kvm_clock(vcpu, clock);
+
+	/* Smoke test: VM teardown (closing guest_memfd) doesn't crash. */
 	kvm_vm_free(vm);
 }
 
@@ -160,16 +177,28 @@ static void test_sw_protected_vm(void)
 	};
 	struct kvm_vcpu *vcpu;
 	struct kvm_vm *vm;
+	vm_paddr_t pvti_gpa;
 	uint64_t clock = 0;
 
 	pr_info("Testing pfncache with SW_PROTECTED_VM (guest_memfd-backed private memory)\n");
 
-	vm = setup_vm(shape, &vcpu);
+	vm = setup_vm(shape, &vcpu, &pvti_gpa);
 
 	/* Verify kvm-clock works with gmem-backed private memory */
 	clock = run_and_verify_kvm_clock(vcpu, clock);
 	clock = run_and_verify_kvm_clock(vcpu, clock);
 
+	/* Convert pvclock page from private to shared */
+	pr_info("  converting pvclock page: private -> shared\n");
+	vm_mem_set_shared(vm, pvti_gpa, getpagesize());
+	clock = run_and_verify_kvm_clock(vcpu, 0);
+
+	/* Convert back to private */
+	pr_info("  converting pvclock page: shared -> private\n");
+	vm_mem_set_private(vm, pvti_gpa, getpagesize());
+	clock = run_and_verify_kvm_clock(vcpu, 0);
+
+	/* Smoke test: VM teardown (closing guest_memfd) doesn't crash. */
 	kvm_vm_free(vm);
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-20 15:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20 15:46 [RFC PATCH v4 0/7] KVM: pfncache: Add guest_memfd support to pfncache Takahiro Itazuri
2026-04-20 15:46 ` [RFC PATCH v4 1/7] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs Takahiro Itazuri
2026-04-20 15:46 ` [RFC PATCH v4 2/7] KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP Takahiro Itazuri
2026-04-20 15:46 ` [RFC PATCH v4 3/7] KVM: Rename invalidate_begin to invalidate_start for consistency Takahiro Itazuri
2026-04-20 15:46 ` [RFC PATCH v4 4/7] KVM: pfncache: Rename invalidate_start() helper Takahiro Itazuri
2026-04-20 15:46 ` [RFC PATCH v4 5/7] KVM: pfncache: Invalidate on gmem invalidation and memattr updates Takahiro Itazuri
2026-04-20 15:46 ` [RFC PATCH v4 6/7] KVM: selftests: Test pfncache with gmem-backed memory Takahiro Itazuri
2026-04-20 15:46 ` [RFC PATCH v4 7/7] KVM: selftests: Test pfncache invalidation for " Takahiro Itazuri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox