[PATCH 0/7] KVM: SEV: Fix RMP #PF due freeing in-use VMSA

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH 0/7] KVM: SEV: Fix RMP #PF due freeing in-use VMSA
@ 2026-06-25 22:22 Sean Christopherson
  2026-06-25 22:22 ` [PATCH 1/7] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests Sean Christopherson
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-06-25 22:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Hyunwoo Kim, Tom Lendacky, Michael Roth,
	Jörg Rödel

Rework KVM's handling of guest-provided (and always guest_memfd-backed) VMSAs
to forcefully reclaim VMSA pages when the pages are being freed from their
backing gmem instance, e.g. in response to PUNCH_HOLE.  In the worst case
scenario, marking the page SHARED in the RMP will fail due to the page being
IN_USE, ultimately leading to RMP #PF violations due to guest_memfd freeing
the memory back to the kernel while it's still assigned to a VM.

Note, the implementation nearly identical to that used by KVM for VMX's APIC
access page (which isn't guest controlled, but is migratable and whose PA is
shoved directly into a vCPU control structure).

Sean Christopherson (7):
  KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP
    guests
  KVM: SEV: Extract loading of guest-provided VMSA to a separate helper
  KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable
  KVM: Rework .gmem_invalidate() into .gmem_free_folio()
  KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is
    zapped
  KVM: x86: Guard .gmem_prepare() declarations with
    HAVE_KVM_GMEM_PREPARE=y
  KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid

 arch/x86/include/asm/kvm-x86-ops.h |   8 +-
 arch/x86/include/asm/kvm_host.h    |  10 +-
 arch/x86/kvm/svm/sev.c             | 152 +++++++++++++++++++++--------
 arch/x86/kvm/svm/svm.c             |   6 +-
 arch/x86/kvm/svm/svm.h             |   8 +-
 arch/x86/kvm/x86.c                 |  10 +-
 include/linux/kvm_host.h           |   3 +-
 virt/kvm/guest_memfd.c             |  17 +---
 8 files changed, 150 insertions(+), 64 deletions(-)


base-commit: a204badd8432f93b7e862e7dac6db0fe3d65f370
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/7] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests
  2026-06-25 22:22 [PATCH 0/7] KVM: SEV: Fix RMP #PF due freeing in-use VMSA Sean Christopherson
@ 2026-06-25 22:22 ` Sean Christopherson
  2026-06-25 22:22 ` [PATCH 2/7] KVM: SEV: Extract loading of guest-provided VMSA to a separate helper Sean Christopherson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-06-25 22:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Hyunwoo Kim, Tom Lendacky, Michael Roth,
	Jörg Rödel

Track the GPA of the guest-provided VMSA used after AP_CREATION events when
running SNP guests, instead of simply tracking whether or not the vCPU is
using a guest-provided VMSA.  KVM needs to know the GPA of the VMSA that's
actively being used so that it can react to MMU invalidation events, i.e.
so that KVM can drop the VMSA if its backing guest_memfd page is punched
out of existence.

Opportunistically rename snp_vmsa_gpa to clarify that it tracks the pending
VMSA GPA, whereas snp_guest_vmsa_gpa now tracks the in-use VMSA GPA.

Note!  Take care to track the GPA, not the GFN, as VALID_PAGE() won't
behave correctly if an invalid GFN is converted to a GPA for checking.

Note #2!  Keep snp_has_guest_vmsa so that switching to a guest-provided
VMSA is sticky, even if the guest-provided VMSA becomes invalid.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 14 +++++++++-----
 arch/x86/kvm/svm/svm.h |  3 ++-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 74fb15551e83..827f5dc06102 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4003,6 +4003,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 
 	/* Clear use of the VMSA */
 	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+	svm->sev_es.snp_guest_vmsa_gpa = INVALID_PAGE;
 
 	/*
 	 * When replacing the VMSA during SEV-SNP AP creation,
@@ -4010,11 +4011,11 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	 */
 	vmcb_mark_all_dirty(svm->vmcb);
 
-	if (!VALID_PAGE(svm->sev_es.snp_vmsa_gpa))
+	if (!VALID_PAGE(svm->sev_es.snp_pending_vmsa_gpa))
 		return;
 
-	gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
-	svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+	gfn = gpa_to_gfn(svm->sev_es.snp_pending_vmsa_gpa);
+	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
 
 	slot = gfn_to_memslot(vcpu->kvm, gfn);
 	if (!slot)
@@ -4039,6 +4040,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	svm->sev_es.snp_has_guest_vmsa = true;
 
 	/* Use the new VMSA */
+	svm->sev_es.snp_guest_vmsa_gpa = gfn_to_gpa(gfn);
 	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
 
 	/* Mark the vCPU as runnable */
@@ -4105,10 +4107,10 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
 			return -EINVAL;
 		}
 
-		target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+		target_svm->sev_es.snp_pending_vmsa_gpa = svm->vmcb->control.exit_info_2;
 		break;
 	case SVM_VMGEXIT_AP_DESTROY:
-		target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+		target_svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
 		break;
 	default:
 		vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
@@ -4791,6 +4793,8 @@ int sev_vcpu_create(struct kvm_vcpu *vcpu)
 		return -ENOMEM;
 
 	svm->sev_es.vmsa = page_address(vmsa_page);
+	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
+	svm->sev_es.snp_guest_vmsa_gpa = INVALID_PAGE;
 
 	vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm);
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 716be21fba33..d077783c287e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -271,7 +271,8 @@ struct vcpu_sev_es_state {
 	u64 ghcb_registered_gpa;
 
 	struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
-	gpa_t snp_vmsa_gpa;
+	gpa_t snp_pending_vmsa_gpa;
+	gpa_t snp_guest_vmsa_gpa;
 	bool snp_ap_waiting_for_reset;
 	bool snp_has_guest_vmsa;
 };
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/7] KVM: SEV: Extract loading of guest-provided VMSA to a separate helper
  2026-06-25 22:22 [PATCH 0/7] KVM: SEV: Fix RMP #PF due freeing in-use VMSA Sean Christopherson
  2026-06-25 22:22 ` [PATCH 1/7] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests Sean Christopherson
@ 2026-06-25 22:22 ` Sean Christopherson
  2026-06-25 22:22 ` [PATCH 3/7] KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable Sean Christopherson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-06-25 22:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Hyunwoo Kim, Tom Lendacky, Michael Roth,
	Jörg Rödel

Extract the loading/retrieval of a guest-provided VMSA to a separate helper
so that KVM can reuse the core logic when refreshing the VMSA after an MMU
invalidation from guest_memfd.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 52 +++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 827f5dc06102..d8ed00f76aa3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3979,29 +3979,17 @@ static int snp_begin_psc(struct vcpu_svm *svm)
 	return snp_do_psc(svm);
 }
 
-/*
- * Invoked as part of svm_vcpu_reset() processing of an init event.
- */
-static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct kvm_memory_slot *slot;
+	gfn_t gfn = gpa_to_gfn(gpa);
 	struct page *page;
 	kvm_pfn_t pfn;
-	gfn_t gfn;
 
-	guard(mutex)(&svm->sev_es.snp_vmsa_mutex);
+	lockdep_assert_held(&svm->sev_es.snp_vmsa_mutex);
 
-	if (!svm->sev_es.snp_ap_waiting_for_reset)
-		return;
-
-	svm->sev_es.snp_ap_waiting_for_reset = false;
-
-	/* Mark the vCPU as offline and not runnable */
-	vcpu->arch.pv.pv_unhalted = false;
-	kvm_set_mp_state(vcpu, KVM_MP_STATE_HALTED);
-
-	/* Clear use of the VMSA */
+	/* Clear use of the VMSA. */
 	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
 	svm->sev_es.snp_guest_vmsa_gpa = INVALID_PAGE;
 
@@ -4011,12 +3999,9 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	 */
 	vmcb_mark_all_dirty(svm->vmcb);
 
-	if (!VALID_PAGE(svm->sev_es.snp_pending_vmsa_gpa))
+	if (!VALID_PAGE(gpa))
 		return;
 
-	gfn = gpa_to_gfn(svm->sev_es.snp_pending_vmsa_gpa);
-	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
-
 	slot = gfn_to_memslot(vcpu->kvm, gfn);
 	if (!slot)
 		return;
@@ -4040,7 +4025,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	svm->sev_es.snp_has_guest_vmsa = true;
 
 	/* Use the new VMSA */
-	svm->sev_es.snp_guest_vmsa_gpa = gfn_to_gpa(gfn);
+	svm->sev_es.snp_guest_vmsa_gpa = gpa;
 	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
 
 	/* Mark the vCPU as runnable */
@@ -4054,6 +4039,31 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	kvm_release_page_clean(page);
 }
 
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	gpa_t gpa;
+
+	guard(mutex)(&svm->sev_es.snp_vmsa_mutex);
+
+	if (!svm->sev_es.snp_ap_waiting_for_reset)
+		return;
+
+	svm->sev_es.snp_ap_waiting_for_reset = false;
+
+	/* Mark the vCPU as offline and not runnable */
+	vcpu->arch.pv.pv_unhalted = false;
+	kvm_set_mp_state(vcpu, KVM_MP_STATE_HALTED);
+
+	gpa = svm->sev_es.snp_pending_vmsa_gpa;
+	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
+
+	sev_snp_reload_vmsa(vcpu, gpa);
+}
+
 static int sev_snp_ap_creation(struct vcpu_svm *svm)
 {
 	struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/7] KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable
  2026-06-25 22:22 [PATCH 0/7] KVM: SEV: Fix RMP #PF due freeing in-use VMSA Sean Christopherson
  2026-06-25 22:22 ` [PATCH 1/7] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests Sean Christopherson
  2026-06-25 22:22 ` [PATCH 2/7] KVM: SEV: Extract loading of guest-provided VMSA to a separate helper Sean Christopherson
@ 2026-06-25 22:22 ` Sean Christopherson
  2026-06-25 22:22 ` [PATCH 4/7] KVM: Rework .gmem_invalidate() into .gmem_free_folio() Sean Christopherson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-06-25 22:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Hyunwoo Kim, Tom Lendacky, Michael Roth,
	Jörg Rödel

Always mark the vCPU as RUNNABLE after responding to AP_CREATE, even if the
guest-specified VMSA is unusable, e.g. isn't backed by a memslot or doesn't
have a backing guest_memfd page.  If the VMSA is unusable, leaving the vCPU
in a non-running state will effectively hang the vCPU instead of reporting
an error to userspace.  This will also allow retrying the VMSA load in the
future, to fix a bug where KVM doesn't honor guest_memfd invalidation
events, e.g. if AP_CREATION races with PUNCH_HOLE.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d8ed00f76aa3..30792adcfc8e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4028,9 +4028,6 @@ static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	svm->sev_es.snp_guest_vmsa_gpa = gpa;
 	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
 
-	/* Mark the vCPU as runnable */
-	kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
-
 	/*
 	 * gmem pages aren't currently migratable, but if this ever changes
 	 * then care should be taken to ensure svm->sev_es.vmsa is pinned
@@ -4062,6 +4059,15 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
 
 	sev_snp_reload_vmsa(vcpu, gpa);
+
+	/*
+	 * Mark the vCPU as runnable for CREATE requests, indicated by a valid
+	 * VMSA GPA, even if installing the VMSA failed, so that KVM_RUN will
+	 * fail instead of blocking indefinitely and hanging the vCPU, e.g. if
+	 * the backing guest_memfd page is unavailable.
+	 */
+	if (VALID_PAGE(gpa))
+		kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
 }
 
 static int sev_snp_ap_creation(struct vcpu_svm *svm)
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/7] KVM: Rework .gmem_invalidate() into .gmem_free_folio()
  2026-06-25 22:22 [PATCH 0/7] KVM: SEV: Fix RMP #PF due freeing in-use VMSA Sean Christopherson
                   ` (2 preceding siblings ...)
  2026-06-25 22:22 ` [PATCH 3/7] KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable Sean Christopherson
@ 2026-06-25 22:22 ` Sean Christopherson
  2026-06-25 22:39   ` sashiko-bot
  2026-06-25 22:22 ` [PATCH 5/7] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped Sean Christopherson
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2026-06-25 22:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Hyunwoo Kim, Tom Lendacky, Michael Roth,
	Jörg Rödel

Rename .gmem_invalidate() to .gmem_free_folio() as the hook is called when
a folio is freed, which is far too late and lack sufficient information for
KVM to actually invalidate its usage of the memory.  Drop guest_memfd's
trampoline and just wire up .free_folio() directly to the arch callback so
that the chain of events is clear and obvious.

Opportunistically guard kvm_x86_ops.gmem_free_folio with an ifdef to
ensure the callback will actually be called, e.g. so that non-SEV code
doesn't try to wire up a callback without enabling
CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  4 +++-
 arch/x86/include/asm/kvm_host.h    |  4 +++-
 arch/x86/kvm/svm/sev.c             |  4 +++-
 arch/x86/kvm/svm/svm.c             |  4 +++-
 arch/x86/kvm/svm/svm.h             |  3 +--
 arch/x86/kvm/x86.c                 |  4 ++--
 include/linux/kvm_host.h           |  2 +-
 virt/kvm/guest_memfd.c             | 13 +------------
 8 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 83dc5086138b..e36eba952705 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -147,7 +147,9 @@ KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
 KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
-KVM_X86_OP_OPTIONAL(gmem_invalidate)
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+KVM_X86_OP_OPTIONAL(gmem_free_folio)
+#endif
 #endif
 
 #undef KVM_X86_OP
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b517257a6315..dd542c7a7376 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1901,7 +1901,9 @@ struct kvm_x86_ops {
 	gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
-	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+	void (*gmem_free_folio)(struct folio *folio);
+#endif
 	int (*gmem_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
 };
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 30792adcfc8e..3d90aa723dc2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -5136,8 +5136,10 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
 	return 0;
 }
 
-void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+void sev_gmem_free_folio(struct folio *folio)
 {
+	kvm_pfn_t start = page_to_pfn(folio_page(folio, 0));
+	kvm_pfn_t end = start + (1ul << folio_order(folio));
 	kvm_pfn_t pfn;
 
 	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ef69a51ab27f..6f1823e820a4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5460,9 +5460,11 @@ struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 
+#ifdef CONFIG_KVM_AMD_SEV
 	.gmem_prepare = sev_gmem_prepare,
-	.gmem_invalidate = sev_gmem_invalidate,
+	.gmem_free_folio = sev_gmem_free_folio,
 	.gmem_max_mapping_level = sev_gmem_max_mapping_level,
+#endif
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d077783c287e..2f8215810a08 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -1009,7 +1009,7 @@ int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
 extern unsigned int max_sev_asid;
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
-void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+void sev_gmem_free_folio(struct folio *folio);
 int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
 struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
 void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa);
@@ -1039,7 +1039,6 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
 {
 	return 0;
 }
-static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
 static inline int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private)
 {
 	return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0626e835e9eb..adc1e1b244c7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10592,9 +10592,9 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_ord
 #endif
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
-void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+void kvm_arch_gmem_free_folio(struct folio *folio)
 {
-	kvm_x86_call(gmem_invalidate)(start, end);
+	kvm_x86_call(gmem_free_folio)(folio);
 }
 #endif
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ab8cfaec82d3..e5b47a5e4cea 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2607,7 +2607,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src,
 #endif
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
-void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+void kvm_arch_gmem_free_folio(struct folio *folio);
 #endif
 
 #ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 86690683b2fe..1618acc3ca64 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -523,23 +523,12 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
 	return MF_DELAYED;
 }
 
-#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
-static void kvm_gmem_free_folio(struct folio *folio)
-{
-	struct page *page = folio_page(folio, 0);
-	kvm_pfn_t pfn = page_to_pfn(page);
-	int order = folio_order(folio);
-
-	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
-}
-#endif
-
 static const struct address_space_operations kvm_gmem_aops = {
 	.dirty_folio = noop_dirty_folio,
 	.migrate_folio	= kvm_gmem_migrate_folio,
 	.error_remove_folio = kvm_gmem_error_folio,
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
-	.free_folio = kvm_gmem_free_folio,
+	.free_folio = kvm_arch_gmem_free_folio,
 #endif
 };
 
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/7] KVM: Rework .gmem_invalidate() into .gmem_free_folio()
  2026-06-25 22:22 ` [PATCH 4/7] KVM: Rework .gmem_invalidate() into .gmem_free_folio() Sean Christopherson
@ 2026-06-25 22:39   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-25 22:39 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] KVM module can be unloaded while guest_memfd folios are being evicted, causing a Use-After-Free crash when calling a_ops->free_folio().
--

commit dd2a52e3b94932f45c23fd6812c93b476a753c85
Author: Sean Christopherson <seanjc@google.com>

KVM: Rework .gmem_invalidate() into .gmem_free_folio()

Rename .gmem_invalidate() to .gmem_free_folio() as the hook is called when
a folio is freed. Drop guest_memfd's trampoline and wire up .free_folio()
directly to the arch callback.

> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 86690683b2fe3..1618acc3ca647 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
[ ... ]
> @@ -523,23 +523,12 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
>  	return MF_DELAYED;
>  }
>  
> -#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> -static void kvm_gmem_free_folio(struct folio *folio)
> -{
> -	struct page *page = folio_page(folio, 0);
> -	kvm_pfn_t pfn = page_to_pfn(page);
> -	int order = folio_order(folio);
> -
> -	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
> -}
> -#endif
> -
>  static const struct address_space_operations kvm_gmem_aops = {
>  	.dirty_folio = noop_dirty_folio,
>  	.migrate_folio	= kvm_gmem_migrate_folio,
>  	.error_remove_folio = kvm_gmem_error_folio,
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> -	.free_folio = kvm_gmem_free_folio,
> +	.free_folio = kvm_arch_gmem_free_folio,

[Severity: High]
This is a pre-existing issue, but can this lead to an execution-after-free
crash during module unload?

When a user closes all KVM-related file descriptors, __fput() drops the
module reference via fops_put(file->f_op) before it calls dput(). The
dput() call subsequently drops the inode reference and triggers page cache
eviction. 

Because kvm_gmem_fs lacks .owner = THIS_MODULE, the superblock and
inodes do not independently pin the KVM module in memory.

If an administrator unloads the KVM module after fops_put() but before
eviction completes, does the mapping->a_ops->free_folio() call in the page
cache eviction path attempt to execute unmapped memory?

>  #endif
>  };
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625222229.3367197-1-seanjc@google.com?part=4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 5/7] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped
  2026-06-25 22:22 [PATCH 0/7] KVM: SEV: Fix RMP #PF due freeing in-use VMSA Sean Christopherson
                   ` (3 preceding siblings ...)
  2026-06-25 22:22 ` [PATCH 4/7] KVM: Rework .gmem_invalidate() into .gmem_free_folio() Sean Christopherson
@ 2026-06-25 22:22 ` Sean Christopherson
  2026-06-25 22:45   ` sashiko-bot
  2026-06-25 22:22 ` [PATCH 6/7] KVM: x86: Guard .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE=y Sean Christopherson
  2026-06-25 22:22 ` [PATCH 7/7] KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid Sean Christopherson
  6 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2026-06-25 22:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Hyunwoo Kim, Tom Lendacky, Michael Roth,
	Jörg Rödel

Wire up a gmem_invalid_range() call for SNP VMs, and use it to force vCPUs
to reload/recheck their guest-provided VMSA if the backing guest_memfd page
is being invalidated, e.g. is being PUNCH_HOLE'd.  Use the same core logic
to handle invalidations as VMX does for the APIC-access page, as the two
concepts are nearly identical: shove the physical address of a page into
the vCPU's control structure:

 1. Snapshot the invalidation sequence counter
 2. Grab the pfn (from guest_memfd in this case)
 3. Acquire mmu_lock for read
 4. Re-request reload if retry is needed, otherwise commit the change.

Note, the re-request action in #4 is necessary as KVM's retry logic is
fuzzy, i.e. can get false positives.  If the guest_memfd page has been
dropped, at some point a subsequent reload will fail to get a PFN from
guest_memfd, and KVM will fail KVM_RUN.  If the retry was due to a false
positive, KVM will retry until there are no relevant MMU notifier events
(and will retry in the "outer" loop, i.e. will drop locks and resched as
needed).

Failure to invalidate the vCPU's control.vmsa_pa (which is checked by
pre_sev_run()) can prevent KVM from properly freeing the page as firmware
will reject the RMPUPDATE to reclaim the page with FAIL_INUSE if the vCPU
is actively running, i.e. if VMSA page is in-use.  That in turn leads to an
RMP #PF on the next use, as the page will still be assigned to the SNP VM.

  SEV-SNP: RMPUPDATE failed for PFN 78d198, pg_level: 1, ret: 3
  SEV-SNP: PFN 0x78d198, RMP entry: [0xfff0000000144001 - 0x000000000000000f]
  CPU: 3 UID: 0 PID: 31345 Comm: sev_snp_vmsa_pu Tainted: G     U     O
  Tainted: [U]=USER, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
  Call Trace:
   <TASK>
   dump_stack_lvl+0x54/0x70
   rmpupdate+0x12c/0x140
   rmp_make_shared+0x3b/0x60
   sev_gmem_invalidate+0xe0/0x170 [kvm_amd]
   delete_from_page_cache_batch+0x1d8/0x220
   truncate_inode_pages_range+0x120/0x3d0
   kvm_gmem_fallocate+0x19a/0x270 [kvm]
   vfs_fallocate+0x1bc/0x1f0
   __x64_sys_fallocate+0x48/0x70
   do_syscall_64+0x10a/0x480
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x496c7e
   </TASK>
  ------------[ cut here ]------------
  SEV: Failed to update RMP entry for PFN 0x78d198 error -14
  WARNING: arch/x86/kvm/svm/sev.c:5160 at sev_gmem_invalidate+0x126/0x170 [kvm_amd], CPU#3: sev_snp_vmsa_pu/31345
  CPU: 3 UID: 0 PID: 31345 Comm: sev_snp_vmsa_pu Tainted: G     U     O
  Tainted: [U]=USER, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
  RIP: 0010:sev_gmem_invalidate+0x12b/0x170 [kvm_amd]
  Call Trace:
   <TASK>
   delete_from_page_cache_batch+0x1d8/0x220
   truncate_inode_pages_range+0x120/0x3d0
   kvm_gmem_fallocate+0x19a/0x270 [kvm]
   vfs_fallocate+0x1bc/0x1f0
   __x64_sys_fallocate+0x48/0x70
   do_syscall_64+0x10a/0x480
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x496c7e
   </TASK>
  irq event stamp: 20689
  hardirqs last  enabled at (20699): [<ffffffff8e76092c>] __console_unlock+0x5c/0x60
  hardirqs last disabled at (20708): [<ffffffff8e760911>] __console_unlock+0x41/0x60
  softirqs last  enabled at (20722): [<ffffffff8e6cd74e>] __irq_exit_rcu+0x7e/0x140
  softirqs last disabled at (20717): [<ffffffff8e6cd74e>] __irq_exit_rcu+0x7e/0x140
  ---[ end trace 0000000000000000 ]---
  BUG: unable to handle page fault for address: ffff99a64d198000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x80000003) - RMP violation
  PGD 13eb001067 P4D 13eb001067 PUD 78d1d1063 PMD 1184e0063 PTE 800000078d198163
  SEV-SNP: PFN 0x78d198, RMP entry: [0x6030000000144001 - 0x000000000000000f]
  Oops: Oops: 0003 [#1] SMP
  CPU: 3 UID: 0 PID: 31407 Comm: highlanderd_hea Tainted: G     U  W  O
  Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
  RIP: 0010:prep_new_page+0x67/0x220
  Call Trace:
   <TASK>
   get_page_from_freelist+0x1c40/0x1c70
   __alloc_frozen_pages_noprof+0xca/0x1f0
   alloc_pages_mpol+0x10b/0x1b0
   alloc_pages_noprof+0x81/0x90
   pte_alloc_one+0x1b/0xd0
   do_pte_missing+0xdf/0x1020
   handle_mm_fault+0x7c7/0xb20
   do_user_addr_fault+0x268/0x6b0
   exc_page_fault+0x67/0xa0
   asm_exc_page_fault+0x26/0x30
  RIP: 0033:0x4a6b1e
   </TASK>
  gsmi: Log Shutdown Reason 0x03
  CR2: ffff99a64d198000
  ---[ end trace 0000000000000000 ]---
  RIP: 0010:prep_new_page+0x67/0x220

Drop the pseudo-TODO comment about needing to pin the page if guest_memfd
every supports migration, as integrating with invalidations events means
KVM will Just Work if/when page migration is ever supported (assuming SNP
hardware supports migrating VMSA pages).

Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Closes: https://lore.kernel.org/all/aimMWzAf5b3luM0b@v4bel
Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Cc: stable@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Jörg Rödel <joro@8bytes.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 +
 arch/x86/include/asm/kvm_host.h    |  4 ++
 arch/x86/kvm/svm/sev.c             | 62 +++++++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.c             |  2 +
 arch/x86/kvm/svm/svm.h             |  2 +
 arch/x86/kvm/x86.c                 |  6 +++
 include/linux/kvm_host.h           |  1 +
 virt/kvm/guest_memfd.c             |  4 ++
 8 files changed, 74 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index e36eba952705..69ca2a848ad6 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -134,6 +134,7 @@ KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
 KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
 KVM_X86_OP_OPTIONAL(vm_move_enc_context_from)
 KVM_X86_OP_OPTIONAL(guest_memory_reclaimed)
+KVM_X86_OP_OPTIONAL(reload_vmsa)
 KVM_X86_OP(get_feature_msr)
 KVM_X86_OP(check_emulate_instruction)
 KVM_X86_OP(apic_init_signal_blocked)
@@ -148,6 +149,7 @@ KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
 KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+KVM_X86_OP_OPTIONAL(gmem_invalidate_range)
 KVM_X86_OP_OPTIONAL(gmem_free_folio)
 #endif
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dd542c7a7376..776272dc6fdc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -122,6 +122,8 @@
 	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VMSA_PAGE_RELOAD \
+	KVM_ARCH_REQ_FLAGS(33, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
 	KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
 
@@ -1878,6 +1880,7 @@ struct kvm_x86_ops {
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
 	int (*vm_move_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
 	void (*guest_memory_reclaimed)(struct kvm *kvm);
+	void (*reload_vmsa)(struct kvm_vcpu *vcpu);
 
 	int (*get_feature_msr)(u32 msr, u64 *data);
 
@@ -1902,6 +1905,7 @@ struct kvm_x86_ops {
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+	void (*gmem_invalidate_range)(struct kvm *kvm, struct kvm_gfn_range *range);
 	void (*gmem_free_folio)(struct folio *folio);
 #endif
 	int (*gmem_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3d90aa723dc2..04be49b1af57 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3979,11 +3979,13 @@ static int snp_begin_psc(struct vcpu_svm *svm)
 	return snp_do_psc(svm);
 }
 
-static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
+static void __sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct kvm_memory_slot *slot;
+	struct kvm *kvm = vcpu->kvm;
 	gfn_t gfn = gpa_to_gfn(gpa);
+	unsigned long mmu_seq;
 	struct page *page;
 	kvm_pfn_t pfn;
 
@@ -4006,6 +4008,9 @@ static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	if (!slot)
 		return;
 
+	mmu_seq = kvm->mmu_invalidate_seq;
+	smp_rmb();
+
 	/*
 	 * The new VMSA will be private memory guest memory, so retrieve the
 	 * PFN from the gmem backend.
@@ -4024,15 +4029,20 @@ static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	 */
 	svm->sev_es.snp_has_guest_vmsa = true;
 
-	/* Use the new VMSA */
+	read_lock(&kvm->mmu_lock);
+	/*
+	 * Save the guest-provided GPA.  If retry is needed, then KVM will try
+	 * again with the same GPA.  If the VMSA is usable, then KVM needs to
+	 * track the GPA so that the VMSA can be reloaded if the backing page
+	 * for the GPA is invalidated.
+	 */
 	svm->sev_es.snp_guest_vmsa_gpa = gpa;
-	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
+	if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn))
+		kvm_make_request(KVM_REQ_VMSA_PAGE_RELOAD, vcpu);
+	else
+		svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
+	read_unlock(&kvm->mmu_lock);
 
-	/*
-	 * gmem pages aren't currently migratable, but if this ever changes
-	 * then care should be taken to ensure svm->sev_es.vmsa is pinned
-	 * through some other means.
-	 */
 	kvm_release_page_clean(page);
 }
 
@@ -4058,7 +4068,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	gpa = svm->sev_es.snp_pending_vmsa_gpa;
 	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
 
-	sev_snp_reload_vmsa(vcpu, gpa);
+	__sev_snp_reload_vmsa(vcpu, gpa);
 
 	/*
 	 * Mark the vCPU as runnable for CREATE requests, indicated by a valid
@@ -4070,6 +4080,15 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 		kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
 }
 
+void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_sev_es_state *sev_es = &to_svm(vcpu)->sev_es;
+
+	guard(mutex)(&sev_es->snp_vmsa_mutex);
+
+	__sev_snp_reload_vmsa(vcpu, sev_es->snp_guest_vmsa_gpa);
+}
+
 static int sev_snp_ap_creation(struct vcpu_svm *svm)
 {
 	struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
@@ -5135,6 +5154,31 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
 
 	return 0;
 }
+void sev_gmem_invalidate_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	/*
+	 * An unstable result for "is SNP" is a-ok here, thanks to mmu_lock.
+	 * The vCPU's VMSA GPA is invalidated before the vCPU is made visible
+	 * to other tasks, and can only become valid while holding mmu_lock,
+	 * after the VM is fully committed to being an SNP VM.
+	 */
+	if (!____sev_snp_guest(kvm))
+		return;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		gpa_t gpa = to_svm(vcpu)->sev_es.snp_guest_vmsa_gpa;
+
+		if (VALID_PAGE(gpa) &&
+		    gpa_to_gfn(gpa) >= range->start &&
+		    gpa_to_gfn(gpa) < range->end)
+			kvm_make_request_and_kick(KVM_REQ_VMSA_PAGE_RELOAD, vcpu);
+	}
+}
 
 void sev_gmem_free_folio(struct folio *folio)
 {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6f1823e820a4..7d3dd3719070 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5445,6 +5445,7 @@ struct kvm_x86_ops svm_x86_ops __initdata = {
 	.mem_enc_register_region = sev_mem_enc_register_region,
 	.mem_enc_unregister_region = sev_mem_enc_unregister_region,
 	.guest_memory_reclaimed = sev_guest_memory_reclaimed,
+	.reload_vmsa = sev_snp_reload_vmsa,
 
 	.vm_copy_enc_context_from = sev_vm_copy_enc_context_from,
 	.vm_move_enc_context_from = sev_vm_move_enc_context_from,
@@ -5462,6 +5463,7 @@ struct kvm_x86_ops svm_x86_ops __initdata = {
 
 #ifdef CONFIG_KVM_AMD_SEV
 	.gmem_prepare = sev_gmem_prepare,
+	.gmem_invalidate_range = sev_gmem_invalidate_range,
 	.gmem_free_folio = sev_gmem_free_folio,
 	.gmem_max_mapping_level = sev_gmem_max_mapping_level,
 #endif
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2f8215810a08..c7ecc5fca689 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -996,6 +996,7 @@ static inline struct page *snp_safe_alloc_page(void)
 {
 	return snp_safe_alloc_page_node(numa_node_id(), GFP_KERNEL_ACCOUNT);
 }
+void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu);
 
 int sev_vcpu_create(struct kvm_vcpu *vcpu);
 void sev_free_vcpu(struct kvm_vcpu *vcpu);
@@ -1009,6 +1010,7 @@ int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
 extern unsigned int max_sev_asid;
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+void sev_gmem_invalidate_range(struct kvm *kvm, struct kvm_gfn_range *range);
 void sev_gmem_free_folio(struct folio *folio);
 int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
 struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index adc1e1b244c7..9df6acf9a982 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8167,6 +8167,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 				goto out;
 			}
 		}
+		if (kvm_check_request(KVM_REQ_VMSA_PAGE_RELOAD, vcpu))
+			kvm_x86_call(reload_vmsa)(vcpu);
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -10592,6 +10594,10 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_ord
 #endif
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+void kvm_arch_gmem_invalidate_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	kvm_x86_call(gmem_invalidate_range)(kvm, range);
+}
 void kvm_arch_gmem_free_folio(struct folio *folio)
 {
 	kvm_x86_call(gmem_free_folio)(folio);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e5b47a5e4cea..6b7f8801505d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2607,6 +2607,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src,
 #endif
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+void kvm_arch_gmem_invalidate_range(struct kvm *kvm, struct kvm_gfn_range *range);
 void kvm_arch_gmem_free_folio(struct folio *folio);
 #endif
 
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 1618acc3ca64..8ec5041934db 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -185,6 +185,10 @@ static void __kvm_gmem_invalidate_start(struct gmem_file *f, pgoff_t start,
 		}
 
 		flush |= kvm_mmu_unmap_gfn_range(kvm, &gfn_range);
+
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+		kvm_arch_gmem_invalidate_range(kvm, &gfn_range);
+#endif
 	}
 
 	if (flush)
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/7] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped
  2026-06-25 22:22 ` [PATCH 5/7] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped Sean Christopherson
@ 2026-06-25 22:45   ` sashiko-bot
  2026-06-25 23:28     ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: sashiko-bot @ 2026-06-25 22:45 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] Deleting a guest_memfd-backed memslot severs invalidation callbacks but leaves the VMSA loaded in hardware, allowing the pages to be freed while in-use and crashing the host.
--

commit 1d8649cc04198822af1b4661735a4bc1d852fe1f
Author: Sean Christopherson <seanjc@google.com>

KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped

This commit wires up a gmem_invalidate_range() call for SNP VMs. It forces
vCPUs to reload or recheck their guest-provided VMSA if the backing
guest_memfd page is being invalidated.

Closes: https://lore.kernel.org/all/aimMWzAf5b3luM0b@v4bel

> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 1618acc3ca647..8ec5041934dbc 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -185,6 +185,10 @@ static void __kvm_gmem_invalidate_start(struct gmem_file *f, pgoff_t start,
>  		}
>  
>  		flush |= kvm_mmu_unmap_gfn_range(kvm, &gfn_range);
> +
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> +		kvm_arch_gmem_invalidate_range(kvm, &gfn_range);
> +#endif
>  	}
>  
>  	if (flush)

[Severity: Critical]
Does this invalidation path cover the case where the memslot is unbound before
the guest_memfd pages are freed?

If userspace deletes the memslot (e.g., via KVM_SET_USER_MEMORY_REGION2),
kvm_gmem_unbind() removes the slot from the f->bindings xarray.

If userspace then closes the guest_memfd file descriptor,
__kvm_gmem_invalidate_start() will iterate over f->bindings here, find it
empty, and skip calling kvm_arch_gmem_invalidate_range().

Since the vCPU is never kicked to reload the VMSA, the hardware could still be
actively using the page. When sev_gmem_free_folio() attempts to transition the
pages to shared via rmp_make_shared(), the hardware will reject it with
FAIL_INUSE.

Could this leak the encrypted page to the host allocator and subsequently
cause an RMP violation #PF?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625222229.3367197-1-seanjc@google.com?part=5

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/7] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped
  2026-06-25 22:45   ` sashiko-bot
@ 2026-06-25 23:28     ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-06-25 23:28 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvm

On Thu, Jun 25, 2026, sashiko-bot@kernel.org wrote:
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 1618acc3ca647..8ec5041934dbc 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -185,6 +185,10 @@ static void __kvm_gmem_invalidate_start(struct gmem_file *f, pgoff_t start,
> >  		}
> >  
> >  		flush |= kvm_mmu_unmap_gfn_range(kvm, &gfn_range);
> > +
> > +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> > +		kvm_arch_gmem_invalidate_range(kvm, &gfn_range);
> > +#endif
> >  	}
> >  
> >  	if (flush)
> 
> [Severity: Critical]

*sigh*  If it's so critical, why couldn't you have found this in one of the
several times I had you review this off-list?

> Does this invalidation path cover the case where the memslot is unbound before
> the guest_memfd pages are freed?

Yes.  I think this would fix things?  I'll try to verify the bug and the fix...

diff --git arch/x86/kvm/mmu/mmu.c arch/x86/kvm/mmu/mmu.c
index 6c13da942bfc..2beb95394282 100644
--- arch/x86/kvm/mmu/mmu.c
+++ arch/x86/kvm/mmu/mmu.c
@@ -7560,8 +7560,14 @@ static void kvm_mmu_zap_memslot_pages_and_flush(struct kvm *kvm,
        kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
 }
 
-static void kvm_mmu_zap_memslot(struct kvm *kvm,
-                               struct kvm_memory_slot *slot)
+static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
+{
+       return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
+              kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+                                  struct kvm_memory_slot *slot)
 {
        struct kvm_gfn_range range = {
                .slot = slot,
@@ -7572,25 +7578,18 @@ static void kvm_mmu_zap_memslot(struct kvm *kvm,
        };
        bool flush;
 
-       write_lock(&kvm->mmu_lock);
-       flush = kvm_unmap_gfn_range(kvm, &range);
-       kvm_mmu_zap_memslot_pages_and_flush(kvm, slot, flush);
-       write_unlock(&kvm->mmu_lock);
-}
-
-static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
-{
-       return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
-              kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
-}
-
-void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
-                                  struct kvm_memory_slot *slot)
-{
-       if (kvm_memslot_flush_zap_all(kvm))
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+       if (slot->gmem.file)
+               kvm_arch_gmem_invalidate_range(kvm, &range);
+#endif
+       if (kvm_memslot_flush_zap_all(kvm)) {
                kvm_mmu_zap_all_fast(kvm);
-       else
-               kvm_mmu_zap_memslot(kvm, slot);
+       } else {
+               write_lock(&kvm->mmu_lock);
+               flush = kvm_unmap_gfn_range(kvm, &range);
+               kvm_mmu_zap_memslot_pages_and_flush(kvm, slot, flush);
+               write_unlock(&kvm->mmu_lock);
+       }
 }
 
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/7] KVM: x86: Guard .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE=y
  2026-06-25 22:22 [PATCH 0/7] KVM: SEV: Fix RMP #PF due freeing in-use VMSA Sean Christopherson
                   ` (4 preceding siblings ...)
  2026-06-25 22:22 ` [PATCH 5/7] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped Sean Christopherson
@ 2026-06-25 22:22 ` Sean Christopherson
  2026-06-25 22:22 ` [PATCH 7/7] KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid Sean Christopherson
  6 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-06-25 22:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Hyunwoo Kim, Tom Lendacky, Michael Roth,
	Jörg Rödel

Wrap the .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE so that
non-SEV code doesn't try to wire up a callback without doing the necessary
enabling.

No functional change intended.

Fixes: 3bb2531e20bf ("KVM: guest_memfd: Add hook for initializing memory")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 4 +++-
 arch/x86/include/asm/kvm_host.h    | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 69ca2a848ad6..fd08454c7553 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -146,12 +146,14 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
-KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
+#endif
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 KVM_X86_OP_OPTIONAL(gmem_invalidate_range)
 KVM_X86_OP_OPTIONAL(gmem_free_folio)
 #endif
+KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
 #endif
 
 #undef KVM_X86_OP
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 776272dc6fdc..ee47a0d1feb9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1903,7 +1903,9 @@ struct kvm_x86_ops {
 
 	gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+#endif
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 	void (*gmem_invalidate_range)(struct kvm *kvm, struct kvm_gfn_range *range);
 	void (*gmem_free_folio)(struct folio *folio);
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/7] KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid
  2026-06-25 22:22 [PATCH 0/7] KVM: SEV: Fix RMP #PF due freeing in-use VMSA Sean Christopherson
                   ` (5 preceding siblings ...)
  2026-06-25 22:22 ` [PATCH 6/7] KVM: x86: Guard .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE=y Sean Christopherson
@ 2026-06-25 22:22 ` Sean Christopherson
  6 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-06-25 22:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Hyunwoo Kim, Tom Lendacky, Michael Roth,
	Jörg Rödel

Track the guest as having a guest-provided VMSA as soon as control.vmsa_pa
is invalidated, instead of waiting to see if the guest-provided VMSA is
usable, so that KVM doesn't switch back to the original VMSA instead of
exiting to userspace (due to an invalid VMSA).  By the time a vCPU tries
to load a guest-provided VMSA, KVM has already communicated "success" for
AP creation, i.e. KVM has committed to using the guest-provided VMSA.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 04be49b1af57..7e06ed16f474 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4001,23 +4001,6 @@ static void __sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	 */
 	vmcb_mark_all_dirty(svm->vmcb);
 
-	if (!VALID_PAGE(gpa))
-		return;
-
-	slot = gfn_to_memslot(vcpu->kvm, gfn);
-	if (!slot)
-		return;
-
-	mmu_seq = kvm->mmu_invalidate_seq;
-	smp_rmb();
-
-	/*
-	 * The new VMSA will be private memory guest memory, so retrieve the
-	 * PFN from the gmem backend.
-	 */
-	if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
-		return;
-
 	/*
 	 * From this point forward, the VMSA will always be a guest-mapped page
 	 * rather than the initial one allocated by KVM in svm->sev_es.vmsa. In
@@ -4029,6 +4012,23 @@ static void __sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	 */
 	svm->sev_es.snp_has_guest_vmsa = true;
 
+	if (!VALID_PAGE(gpa))
+		return;
+
+	slot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!slot)
+		return;
+
+	mmu_seq = kvm->mmu_invalidate_seq;
+	smp_rmb();
+
+	/*
+	 * The new VMSA will be private memory guest memory, so retrieve the
+	 * PFN from the gmem backend.
+	 */
+	if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
+		return;
+
 	read_lock(&kvm->mmu_lock);
 	/*
 	 * Save the guest-provided GPA.  If retry is needed, then KVM will try
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-06-25 23:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 22:22 [PATCH 0/7] KVM: SEV: Fix RMP #PF due freeing in-use VMSA Sean Christopherson
2026-06-25 22:22 ` [PATCH 1/7] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests Sean Christopherson
2026-06-25 22:22 ` [PATCH 2/7] KVM: SEV: Extract loading of guest-provided VMSA to a separate helper Sean Christopherson
2026-06-25 22:22 ` [PATCH 3/7] KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable Sean Christopherson
2026-06-25 22:22 ` [PATCH 4/7] KVM: Rework .gmem_invalidate() into .gmem_free_folio() Sean Christopherson
2026-06-25 22:39   ` sashiko-bot
2026-06-25 22:22 ` [PATCH 5/7] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped Sean Christopherson
2026-06-25 22:45   ` sashiko-bot
2026-06-25 23:28     ` Sean Christopherson
2026-06-25 22:22 ` [PATCH 6/7] KVM: x86: Guard .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE=y Sean Christopherson
2026-06-25 22:22 ` [PATCH 7/7] KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox