[PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA
@ 2026-06-30 22:25 Sean Christopherson
  2026-06-30 22:25 ` [PATCH v3 01/12] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests Sean Christopherson
                   ` (11 more replies)
  0 siblings, 12 replies; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Rework KVM's handling of guest-provided (and always guest_memfd-backed) VMSAs
to forcefully reclaim VMSA pages when the pages are being freed from their
backing gmem instance, e.g. in response to PUNCH_HOLE.  In the worst case
scenario, marking the page SHARED in the RMP will fail due to the page being
IN_USE, ultimately leading to RMP #PF violations due to guest_memfd freeing
the memory back to the kernel while it's still assigned to a VM.

Note, the implementation nearly identical to that used by KVM for VMX's APIC
access page (which isn't guest controlled, but is migratable and whose PA is
shoved directly into a vCPU control structure).

v3:
 - Ensure disabling quirks while the VM is live won't result in KVM skipping
   the back-half of "zap all fast". [Sashiko]
 - s/gmem_free_folio/gmem_reclaim_memory. [Ackerley]
 - Use {READ,WRITE}_ONCE() when accessing the guest's VMSA GPA outside of the
   per-vCPU mutex, and comment. [Sashiko]

v2:
 - https://lore.kernel.org/all/20260626231416.3943216-1-seanjc@google.com
 - Invalidate VMSAs if the memslot is DELETED or MOVED. [Sashiko]
 - Limit stable@ patches without a Fixes to 6.12+.

v1: https://lore.kernel.org/all/20260625222229.3367197-2-seanjc@google.com

Sean Christopherson (12):
  KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP
    guests
  KVM: SEV: Extract loading of guest-provided VMSA to a separate helper
  KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable
  KVM: Rename .gmem_invalidate() to .gmem_reclaim_memory()
  KVM: x86: Serialize writes to disabled_quirks using kvm->lock
  KVM: x86: Ensure runtime reads of disabled_quirks are resolved once
  KVM: x86/mmu: Fold kvm_mmu_zap_memslot() into
    kvm_arch_flush_shadow_memslot()
  KVM: x86/mmu: Split kvm_mmu_zap_all_fast() into "front" and "back"
    halves
  KVM: x86/mmu: Use split "zap all fast" helpers when invalidating
    memslot
  KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is
    zapped
  KVM: x86: Guard .gmem_prepare() declarations with
    HAVE_KVM_GMEM_PREPARE=y
  KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid

 arch/x86/include/asm/kvm-x86-ops.h |   8 +-
 arch/x86/include/asm/kvm_host.h    |  10 +-
 arch/x86/kvm/mmu/mmu.c             |  74 +++++++------
 arch/x86/kvm/svm/sev.c             | 163 +++++++++++++++++++++--------
 arch/x86/kvm/svm/svm.c             |   6 +-
 arch/x86/kvm/svm/svm.h             |   8 +-
 arch/x86/kvm/x86.c                 |  15 ++-
 arch/x86/kvm/x86.h                 |   2 +-
 include/linux/kvm_host.h           |   3 +-
 virt/kvm/guest_memfd.c             |   6 +-
 10 files changed, 210 insertions(+), 85 deletions(-)


base-commit: a204badd8432f93b7e862e7dac6db0fe3d65f370
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 01/12] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
@ 2026-06-30 22:25 ` Sean Christopherson
  2026-07-01 19:33   ` Michael Roth
  2026-06-30 22:25 ` [PATCH v3 02/12] KVM: SEV: Extract loading of guest-provided VMSA to a separate helper Sean Christopherson
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Track the GPA of the guest-provided VMSA used after AP_CREATION events when
running SNP guests, instead of simply tracking whether or not the vCPU is
using a guest-provided VMSA.  KVM needs to know the GPA of the VMSA that's
actively being used so that it can react to MMU invalidation events, i.e.
so that KVM can drop the VMSA if its backing guest_memfd page is punched
out of existence.

Opportunistically rename snp_vmsa_gpa to clarify that it tracks the pending
VMSA GPA, whereas snp_guest_vmsa_gpa now tracks the in-use VMSA GPA.

Note!  Take care to track the GPA, not the GFN, as VALID_PAGE() won't
behave correctly if an invalid GFN is converted to a GPA for checking.

Note #2!  Keep snp_has_guest_vmsa so that switching to a guest-provided
VMSA is sticky, even if the guest-provided VMSA becomes invalid.

No functional change intended.

Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 14 +++++++++-----
 arch/x86/kvm/svm/svm.h |  3 ++-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 74fb15551e83..827f5dc06102 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4003,6 +4003,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 
 	/* Clear use of the VMSA */
 	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+	svm->sev_es.snp_guest_vmsa_gpa = INVALID_PAGE;
 
 	/*
 	 * When replacing the VMSA during SEV-SNP AP creation,
@@ -4010,11 +4011,11 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	 */
 	vmcb_mark_all_dirty(svm->vmcb);
 
-	if (!VALID_PAGE(svm->sev_es.snp_vmsa_gpa))
+	if (!VALID_PAGE(svm->sev_es.snp_pending_vmsa_gpa))
 		return;
 
-	gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
-	svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+	gfn = gpa_to_gfn(svm->sev_es.snp_pending_vmsa_gpa);
+	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
 
 	slot = gfn_to_memslot(vcpu->kvm, gfn);
 	if (!slot)
@@ -4039,6 +4040,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	svm->sev_es.snp_has_guest_vmsa = true;
 
 	/* Use the new VMSA */
+	svm->sev_es.snp_guest_vmsa_gpa = gfn_to_gpa(gfn);
 	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
 
 	/* Mark the vCPU as runnable */
@@ -4105,10 +4107,10 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
 			return -EINVAL;
 		}
 
-		target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+		target_svm->sev_es.snp_pending_vmsa_gpa = svm->vmcb->control.exit_info_2;
 		break;
 	case SVM_VMGEXIT_AP_DESTROY:
-		target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+		target_svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
 		break;
 	default:
 		vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
@@ -4791,6 +4793,8 @@ int sev_vcpu_create(struct kvm_vcpu *vcpu)
 		return -ENOMEM;
 
 	svm->sev_es.vmsa = page_address(vmsa_page);
+	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
+	svm->sev_es.snp_guest_vmsa_gpa = INVALID_PAGE;
 
 	vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm);
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 716be21fba33..d077783c287e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -271,7 +271,8 @@ struct vcpu_sev_es_state {
 	u64 ghcb_registered_gpa;
 
 	struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
-	gpa_t snp_vmsa_gpa;
+	gpa_t snp_pending_vmsa_gpa;
+	gpa_t snp_guest_vmsa_gpa;
 	bool snp_ap_waiting_for_reset;
 	bool snp_has_guest_vmsa;
 };
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 02/12] KVM: SEV: Extract loading of guest-provided VMSA to a separate helper
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
  2026-06-30 22:25 ` [PATCH v3 01/12] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests Sean Christopherson
@ 2026-06-30 22:25 ` Sean Christopherson
  2026-07-01 19:34   ` Michael Roth
  2026-06-30 22:25 ` [PATCH v3 03/12] KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable Sean Christopherson
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Extract the loading/retrieval of a guest-provided VMSA to a separate helper
so that KVM can reuse the core logic when refreshing the VMSA after an MMU
invalidation from guest_memfd.

No functional change intended.

Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 52 +++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 827f5dc06102..d8ed00f76aa3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3979,29 +3979,17 @@ static int snp_begin_psc(struct vcpu_svm *svm)
 	return snp_do_psc(svm);
 }
 
-/*
- * Invoked as part of svm_vcpu_reset() processing of an init event.
- */
-static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct kvm_memory_slot *slot;
+	gfn_t gfn = gpa_to_gfn(gpa);
 	struct page *page;
 	kvm_pfn_t pfn;
-	gfn_t gfn;
 
-	guard(mutex)(&svm->sev_es.snp_vmsa_mutex);
+	lockdep_assert_held(&svm->sev_es.snp_vmsa_mutex);
 
-	if (!svm->sev_es.snp_ap_waiting_for_reset)
-		return;
-
-	svm->sev_es.snp_ap_waiting_for_reset = false;
-
-	/* Mark the vCPU as offline and not runnable */
-	vcpu->arch.pv.pv_unhalted = false;
-	kvm_set_mp_state(vcpu, KVM_MP_STATE_HALTED);
-
-	/* Clear use of the VMSA */
+	/* Clear use of the VMSA. */
 	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
 	svm->sev_es.snp_guest_vmsa_gpa = INVALID_PAGE;
 
@@ -4011,12 +3999,9 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	 */
 	vmcb_mark_all_dirty(svm->vmcb);
 
-	if (!VALID_PAGE(svm->sev_es.snp_pending_vmsa_gpa))
+	if (!VALID_PAGE(gpa))
 		return;
 
-	gfn = gpa_to_gfn(svm->sev_es.snp_pending_vmsa_gpa);
-	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
-
 	slot = gfn_to_memslot(vcpu->kvm, gfn);
 	if (!slot)
 		return;
@@ -4040,7 +4025,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	svm->sev_es.snp_has_guest_vmsa = true;
 
 	/* Use the new VMSA */
-	svm->sev_es.snp_guest_vmsa_gpa = gfn_to_gpa(gfn);
+	svm->sev_es.snp_guest_vmsa_gpa = gpa;
 	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
 
 	/* Mark the vCPU as runnable */
@@ -4054,6 +4039,31 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	kvm_release_page_clean(page);
 }
 
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	gpa_t gpa;
+
+	guard(mutex)(&svm->sev_es.snp_vmsa_mutex);
+
+	if (!svm->sev_es.snp_ap_waiting_for_reset)
+		return;
+
+	svm->sev_es.snp_ap_waiting_for_reset = false;
+
+	/* Mark the vCPU as offline and not runnable */
+	vcpu->arch.pv.pv_unhalted = false;
+	kvm_set_mp_state(vcpu, KVM_MP_STATE_HALTED);
+
+	gpa = svm->sev_es.snp_pending_vmsa_gpa;
+	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
+
+	sev_snp_reload_vmsa(vcpu, gpa);
+}
+
 static int sev_snp_ap_creation(struct vcpu_svm *svm)
 {
 	struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 03/12] KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
  2026-06-30 22:25 ` [PATCH v3 01/12] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests Sean Christopherson
  2026-06-30 22:25 ` [PATCH v3 02/12] KVM: SEV: Extract loading of guest-provided VMSA to a separate helper Sean Christopherson
@ 2026-06-30 22:25 ` Sean Christopherson
  2026-07-01 19:36   ` Michael Roth
  2026-06-30 22:25 ` [PATCH v3 04/12] KVM: Rename .gmem_invalidate() to .gmem_reclaim_memory() Sean Christopherson
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Always mark the vCPU as RUNNABLE after responding to AP_CREATE, even if the
guest-specified VMSA is unusable, e.g. isn't backed by a memslot or doesn't
have a backing guest_memfd page.  If the VMSA is unusable, leaving the vCPU
in a non-running state will effectively hang the vCPU instead of reporting
an error to userspace.  This will also allow retrying the VMSA load in the
future, to fix a bug where KVM doesn't honor guest_memfd invalidation
events, e.g. if AP_CREATION races with PUNCH_HOLE.

Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d8ed00f76aa3..30792adcfc8e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4028,9 +4028,6 @@ static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	svm->sev_es.snp_guest_vmsa_gpa = gpa;
 	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
 
-	/* Mark the vCPU as runnable */
-	kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
-
 	/*
 	 * gmem pages aren't currently migratable, but if this ever changes
 	 * then care should be taken to ensure svm->sev_es.vmsa is pinned
@@ -4062,6 +4059,15 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
 
 	sev_snp_reload_vmsa(vcpu, gpa);
+
+	/*
+	 * Mark the vCPU as runnable for CREATE requests, indicated by a valid
+	 * VMSA GPA, even if installing the VMSA failed, so that KVM_RUN will
+	 * fail instead of blocking indefinitely and hanging the vCPU, e.g. if
+	 * the backing guest_memfd page is unavailable.
+	 */
+	if (VALID_PAGE(gpa))
+		kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
 }
 
 static int sev_snp_ap_creation(struct vcpu_svm *svm)
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 04/12] KVM: Rename .gmem_invalidate() to .gmem_reclaim_memory()
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
                   ` (2 preceding siblings ...)
  2026-06-30 22:25 ` [PATCH v3 03/12] KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable Sean Christopherson
@ 2026-06-30 22:25 ` Sean Christopherson
  2026-06-30 22:39   ` sashiko-bot
  2026-07-01 19:41   ` Michael Roth
  2026-06-30 22:26 ` [PATCH v3 05/12] KVM: x86: Serialize writes to disabled_quirks using kvm->lock Sean Christopherson
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Rename .gmem_invalidate() to .gmem_reclaim_memory() as the hook is called
when a folio is freed, which is far too late and lacks sufficient
information for KVM to actually invalidate its usage of the memory.

Keep guest_memfd's trampoline, even though it would be trivial to wire up
.free_folio() directly to an arch callback, to avoid bleeding guest_memfd
internals into arch code (specifically, avoid referencing folios in arch
code).

Opportunistically guard kvm_x86_ops.gmem_reclaim_memory() with an ifdef to
ensure the callback will actually be called, e.g. so that non-SEV code
doesn't try to wire up a callback without enabling
CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE.

No functional change intended.

Cc: Ackerley Tng <ackerleytng@google.com>
Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 4 +++-
 arch/x86/include/asm/kvm_host.h    | 4 +++-
 arch/x86/kvm/svm/sev.c             | 2 +-
 arch/x86/kvm/svm/svm.c             | 4 +++-
 arch/x86/kvm/svm/svm.h             | 3 +--
 arch/x86/kvm/x86.c                 | 4 ++--
 include/linux/kvm_host.h           | 2 +-
 virt/kvm/guest_memfd.c             | 2 +-
 8 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 83dc5086138b..acae9f6d6c5e 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -147,7 +147,9 @@ KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
 KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
-KVM_X86_OP_OPTIONAL(gmem_invalidate)
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+KVM_X86_OP_OPTIONAL(gmem_reclaim_memory)
+#endif
 #endif
 
 #undef KVM_X86_OP
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b517257a6315..5e8603deb252 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1901,7 +1901,9 @@ struct kvm_x86_ops {
 	gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
-	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+	void (*gmem_reclaim_memory)(kvm_pfn_t start, kvm_pfn_t end);
+#endif
 	int (*gmem_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
 };
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 30792adcfc8e..4465e75494f2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -5136,7 +5136,7 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
 	return 0;
 }
 
-void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+void sev_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end)
 {
 	kvm_pfn_t pfn;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ef69a51ab27f..6be0000ab386 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5460,9 +5460,11 @@ struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 
+#ifdef CONFIG_KVM_AMD_SEV
 	.gmem_prepare = sev_gmem_prepare,
-	.gmem_invalidate = sev_gmem_invalidate,
+	.gmem_reclaim_memory = sev_gmem_reclaim_memory,
 	.gmem_max_mapping_level = sev_gmem_max_mapping_level,
+#endif
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d077783c287e..cf7c1a437f38 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -1009,7 +1009,7 @@ int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
 extern unsigned int max_sev_asid;
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
-void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+void sev_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end);
 int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
 struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
 void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa);
@@ -1039,7 +1039,6 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
 {
 	return 0;
 }
-static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
 static inline int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private)
 {
 	return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0626e835e9eb..6b9a1b0b1460 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10592,9 +10592,9 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_ord
 #endif
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
-void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+void kvm_arch_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end)
 {
-	kvm_x86_call(gmem_invalidate)(start, end);
+	kvm_x86_call(gmem_reclaim_memory)(start, end);
 }
 #endif
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ab8cfaec82d3..d777eaadbcd2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2607,7 +2607,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src,
 #endif
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
-void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+void kvm_arch_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end);
 #endif
 
 #ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 86690683b2fe..db0fcc38b145 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -530,7 +530,7 @@ static void kvm_gmem_free_folio(struct folio *folio)
 	kvm_pfn_t pfn = page_to_pfn(page);
 	int order = folio_order(folio);
 
-	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
+	kvm_arch_gmem_reclaim_memory(pfn, pfn + (1ul << order));
 }
 #endif
 
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 05/12] KVM: x86: Serialize writes to disabled_quirks using kvm->lock
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
                   ` (3 preceding siblings ...)
  2026-06-30 22:25 ` [PATCH v3 04/12] KVM: Rename .gmem_invalidate() to .gmem_reclaim_memory() Sean Christopherson
@ 2026-06-30 22:26 ` Sean Christopherson
  2026-07-01 21:59   ` Michael Roth
  2026-06-30 22:26 ` [PATCH v3 06/12] KVM: x86: Ensure runtime reads of disabled_quirks are resolved once Sean Christopherson
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Protect writes to disabled_quirks with kvm->lock to ensure KVM doesn't
clobber state in the unlikely scenario that userspace disables disparate
quirks from multiple tasks.  More importantly, this will allow wrapping
accesses with {READ,WRITE}_ONCE without "needing" to also guard the writer
with a useless and confusing READ_ONCE (since the RMW wouldn't be atomic
anyways).

Ideally, KVM would disallow disabling quirks once quirks are "live", but
that would be a potentially breaking userspace ABI change, and while all
existing quirks are fully live only after vCPUs have been created, several
MMU-related quirks, IGNORE_GUEST_PAT and SLOT_ZAP_ALL, are partially live
at all times.  Because populating MMUs requires a vCPU, the guest-visible
behavior of IGNORE_GUEST_PAT and SLOT_ZAP_ALL requires a vCPU, but for KVM
itself, processing the quirk (or not) has functional impact, i.e. for all
intents and purposes, KVM can't prevent those quirks from being disabled
after they've been consumed.

Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b9a1b0b1460..74f1d7169218 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3939,7 +3939,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			break;
 		fallthrough;
 	case KVM_CAP_DISABLE_QUIRKS:
+		mutex_lock(&kvm->lock);
 		kvm->arch.disabled_quirks |= cap->args[0] & kvm_caps.supported_quirks;
+		mutex_unlock(&kvm->lock);
 		r = 0;
 		break;
 	case KVM_CAP_SPLIT_IRQCHIP: {
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 06/12] KVM: x86: Ensure runtime reads of disabled_quirks are resolved once
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
                   ` (4 preceding siblings ...)
  2026-06-30 22:26 ` [PATCH v3 05/12] KVM: x86: Serialize writes to disabled_quirks using kvm->lock Sean Christopherson
@ 2026-06-30 22:26 ` Sean Christopherson
  2026-07-01 22:00   ` Michael Roth
  2026-06-30 22:26 ` [PATCH v3 07/12] KVM: x86/mmu: Fold kvm_mmu_zap_memslot() into kvm_arch_flush_shadow_memslot() Sean Christopherson
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Wrap the sole reader of disabled_quirks with READ_ONCE(), and wrap the
post-VM-creation write to disabled_quirks with WRITE_ONCE(), to ensure
checking the status of a quirk doesn't re-read disabled_quirks *if* the
caller needs such a guarantee.  This will allow splitting the "fast" MMU
zap into front and back halves, without potentially skipping the back
half if SLOT_ZAP_ALL were concurrently disabled (which would be "fine" in
the current code base, but far from ideal).

Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 3 ++-
 arch/x86/kvm/x86.h | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 74f1d7169218..bc0c3163f4a3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3940,7 +3940,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		fallthrough;
 	case KVM_CAP_DISABLE_QUIRKS:
 		mutex_lock(&kvm->lock);
-		kvm->arch.disabled_quirks |= cap->args[0] & kvm_caps.supported_quirks;
+		WRITE_ONCE(kvm->arch.disabled_quirks,
+			   kvm->arch.disabled_quirks | (cap->args[0] & kvm_caps.supported_quirks));
 		mutex_unlock(&kvm->lock);
 		r = 0;
 		break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 8ece468087a8..75f13d88db58 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -304,7 +304,7 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 
 static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
 {
-	return !(kvm->arch.disabled_quirks & quirk);
+	return !(READ_ONCE(kvm->arch.disabled_quirks) & quirk);
 }
 
 static __always_inline void kvm_request_l1tf_flush_l1d(void)
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 07/12] KVM: x86/mmu: Fold kvm_mmu_zap_memslot() into kvm_arch_flush_shadow_memslot()
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
                   ` (5 preceding siblings ...)
  2026-06-30 22:26 ` [PATCH v3 06/12] KVM: x86: Ensure runtime reads of disabled_quirks are resolved once Sean Christopherson
@ 2026-06-30 22:26 ` Sean Christopherson
  2026-07-01 22:04   ` Michael Roth
  2026-06-30 22:26 ` [PATCH v3 08/12] KVM: x86/mmu: Split kvm_mmu_zap_all_fast() into "front" and "back" halves Sean Christopherson
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Fold kvm_mmu_zap_memslot() into its sole caller so that its GFN range
structure can be used to trigger guest_memfd invalidations regardless of
whether KVM will do a partial or full zap of the MMU.

No functional change intended.

Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6c13da942bfc..223d80b12b9b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7560,8 +7560,14 @@ static void kvm_mmu_zap_memslot_pages_and_flush(struct kvm *kvm,
 	kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
 }
 
-static void kvm_mmu_zap_memslot(struct kvm *kvm,
-				struct kvm_memory_slot *slot)
+static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
+{
+	return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
+	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+				   struct kvm_memory_slot *slot)
 {
 	struct kvm_gfn_range range = {
 		.slot = slot,
@@ -7572,25 +7578,14 @@ static void kvm_mmu_zap_memslot(struct kvm *kvm,
 	};
 	bool flush;
 
-	write_lock(&kvm->mmu_lock);
-	flush = kvm_unmap_gfn_range(kvm, &range);
-	kvm_mmu_zap_memslot_pages_and_flush(kvm, slot, flush);
-	write_unlock(&kvm->mmu_lock);
-}
-
-static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
-{
-	return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
-	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
-}
-
-void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
-				   struct kvm_memory_slot *slot)
-{
-	if (kvm_memslot_flush_zap_all(kvm))
+	if (kvm_memslot_flush_zap_all(kvm)) {
 		kvm_mmu_zap_all_fast(kvm);
-	else
-		kvm_mmu_zap_memslot(kvm, slot);
+	} else {
+		write_lock(&kvm->mmu_lock);
+		flush = kvm_unmap_gfn_range(kvm, &range);
+		kvm_mmu_zap_memslot_pages_and_flush(kvm, slot, flush);
+		write_unlock(&kvm->mmu_lock);
+	}
 }
 
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 08/12] KVM: x86/mmu: Split kvm_mmu_zap_all_fast() into "front" and "back" halves
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
                   ` (6 preceding siblings ...)
  2026-06-30 22:26 ` [PATCH v3 07/12] KVM: x86/mmu: Fold kvm_mmu_zap_memslot() into kvm_arch_flush_shadow_memslot() Sean Christopherson
@ 2026-06-30 22:26 ` Sean Christopherson
  2026-07-01 22:07   ` Michael Roth
  2026-06-30 22:26 ` [PATCH v3 09/12] KVM: x86/mmu: Use split "zap all fast" helpers when invalidating memslot Sean Christopherson
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Split kvm_mmu_zap_all_fast() into a "front half" and a "back half", where
the front half is everything that runs with mmu_lock held for write, and
the back half is the code that runs outside of mmu_lock.  This will allow
putting more code inside kvm_arch_flush_shadow_memslot()'s critical section
without having to take mmu_lock twice in quick succession.

No functional change intended.

Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 223d80b12b9b..a5c2a560a88a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6921,20 +6921,11 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 }
 
-/*
- * Fast invalidate all shadow pages and use lock-break technique
- * to zap obsolete pages.
- *
- * It's required when memslot is being deleted or VM is being
- * destroyed, in these cases, we should ensure that KVM MMU does
- * not use any resource of the being-deleted slot or all slots
- * after calling the function.
- */
-static void kvm_mmu_zap_all_fast(struct kvm *kvm)
+static void __kvm_mmu_zap_all_fast_front_half(struct kvm *kvm)
 {
 	lockdep_assert_held(&kvm->slots_lock);
+	lockdep_assert_held_write(&kvm->mmu_lock);
 
-	write_lock(&kvm->mmu_lock);
 	trace_kvm_mmu_zap_all_fast(kvm);
 
 	/*
@@ -6971,8 +6962,12 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
 
 	kvm_zap_obsolete_pages(kvm);
+}
 
-	write_unlock(&kvm->mmu_lock);
+static void __kvm_mmu_zap_all_fast_back_half(struct kvm *kvm)
+{
+	lockdep_assert_held(&kvm->slots_lock);
+	lockdep_assert_not_held(&kvm->mmu_lock);
 
 	/*
 	 * Zap the invalidated TDP MMU roots, all SPTEs must be dropped before
@@ -6986,6 +6981,24 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 		kvm_tdp_mmu_zap_invalidated_roots(kvm, true);
 }
 
+/*
+ * Fast invalidate all shadow pages and use lock-break technique
+ * to zap obsolete pages.
+ *
+ * It's required when memslot is being deleted or VM is being
+ * destroyed, in these cases, we should ensure that KVM MMU does
+ * not use any resource of the being-deleted slot or all slots
+ * after calling the function.
+ */
+static void kvm_mmu_zap_all_fast(struct kvm *kvm)
+{
+	write_lock(&kvm->mmu_lock);
+	__kvm_mmu_zap_all_fast_front_half(kvm);
+	write_unlock(&kvm->mmu_lock);
+
+	__kvm_mmu_zap_all_fast_back_half(kvm);
+}
+
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
 	int r, i;
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 09/12] KVM: x86/mmu: Use split "zap all fast" helpers when invalidating memslot
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
                   ` (7 preceding siblings ...)
  2026-06-30 22:26 ` [PATCH v3 08/12] KVM: x86/mmu: Split kvm_mmu_zap_all_fast() into "front" and "back" halves Sean Christopherson
@ 2026-06-30 22:26 ` Sean Christopherson
  2026-07-01 22:19   ` Michael Roth
  2026-06-30 22:26 ` [PATCH v3 10/12] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped Sean Christopherson
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Manually invoke the front half and back half of the "zap all fast" flow
when invalidating a memslot so that mmu_lock is acquired at function scope
in kvm_arch_flush_shadow_memslot().   This will allow putting more code
inside the critical section without having to take mmu_lock twice in quick
succession.

Opportunistically open code checking whether or not to do the fast zap, to
discourage removing the local "zap_all" in a future cleanup, i.e. to ensure
the SLOT_ZAP_ALL quirk is queried exactly once.  Processing the front half
but not the back half of the fast zap (if SLOT_ZAP_ALL were disabled
concurrently) would result in KVM unnecessarily keeping invalid TDP MMU
roots until the VM is destroyed.

No functional change intended.

Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a5c2a560a88a..3eb1f86593b1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7573,12 +7573,6 @@ static void kvm_mmu_zap_memslot_pages_and_flush(struct kvm *kvm,
 	kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
 }
 
-static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
-{
-	return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
-	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
-}
-
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
@@ -7589,16 +7583,23 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 		.may_block = true,
 		.attr_filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED,
 	};
+	bool zap_all = kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
+		       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
 	bool flush;
 
-	if (kvm_memslot_flush_zap_all(kvm)) {
-		kvm_mmu_zap_all_fast(kvm);
+	write_lock(&kvm->mmu_lock);
+
+	if (zap_all) {
+		__kvm_mmu_zap_all_fast_front_half(kvm);
 	} else {
-		write_lock(&kvm->mmu_lock);
 		flush = kvm_unmap_gfn_range(kvm, &range);
 		kvm_mmu_zap_memslot_pages_and_flush(kvm, slot, flush);
-		write_unlock(&kvm->mmu_lock);
 	}
+
+	write_unlock(&kvm->mmu_lock);
+
+	if (zap_all)
+		__kvm_mmu_zap_all_fast_back_half(kvm);
 }
 
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 10/12] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
                   ` (8 preceding siblings ...)
  2026-06-30 22:26 ` [PATCH v3 09/12] KVM: x86/mmu: Use split "zap all fast" helpers when invalidating memslot Sean Christopherson
@ 2026-06-30 22:26 ` Sean Christopherson
  2026-07-01 21:56   ` Michael Roth
  2026-06-30 22:26 ` [PATCH v3 11/12] KVM: x86: Guard .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE=y Sean Christopherson
  2026-06-30 22:26 ` [PATCH v3 12/12] KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid Sean Christopherson
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Wire up a gmem_invalid_range() call for SNP VMs, and use it to force vCPUs
to reload/recheck their guest-provided VMSA if the backing guest_memfd page
is being invalidated, e.g. is being PUNCH_HOLE'd.  Use the same core logic
to handle invalidations as VMX does for the APIC-access page, as the two
concepts are nearly identical: shove the physical address of a page into
the vCPU's control structure:

 1. Snapshot the invalidation sequence counter
 2. Grab the pfn (from guest_memfd in this case)
 3. Acquire mmu_lock for read
 4. Re-request reload if retry is needed, otherwise commit the change.

Note, the re-request action in #4 is necessary as KVM's retry logic is
fuzzy, i.e. can get false positives.  If the guest_memfd page has been
dropped, at some point a subsequent reload will fail to get a PFN from
guest_memfd, and KVM will fail KVM_RUN.  If the retry was due to a false
positive, KVM will retry until there are no relevant MMU notifier events
(and will retry in the "outer" loop, i.e. will drop locks and resched as
needed).

Note #2!  Take care to invalidate the VMSA when a relevant memslot is
DELETED or MOVED, as invalidations in response to PUNCH_HOLE are predicated
on memslot bindings (KVM doesn't know what GFN range(s) to invalidate
without a binding).  And more importantly, the VMSA mapping requires a
memslot, i.e. must be invalidated if its memslots disappears, regardless of
the state of the underlying guest_memfd inode.

Failure to invalidate the vCPU's control.vmsa_pa (which is checked by
pre_sev_run()) can prevent KVM from properly freeing the page as firmware
will reject the RMPUPDATE to reclaim the page with FAIL_INUSE if the vCPU
is actively running, i.e. if VMSA page is in-use.  That in turn leads to an
RMP #PF on the next use, as the page will still be assigned to the SNP VM.

  SEV-SNP: RMPUPDATE failed for PFN 78d198, pg_level: 1, ret: 3
  SEV-SNP: PFN 0x78d198, RMP entry: [0xfff0000000144001 - 0x000000000000000f]
  CPU: 3 UID: 0 PID: 31345 Comm: sev_snp_vmsa_pu Tainted: G     U     O
  Tainted: [U]=USER, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
  Call Trace:
   <TASK>
   dump_stack_lvl+0x54/0x70
   rmpupdate+0x12c/0x140
   rmp_make_shared+0x3b/0x60
   sev_gmem_invalidate+0xe0/0x170 [kvm_amd]
   delete_from_page_cache_batch+0x1d8/0x220
   truncate_inode_pages_range+0x120/0x3d0
   kvm_gmem_fallocate+0x19a/0x270 [kvm]
   vfs_fallocate+0x1bc/0x1f0
   __x64_sys_fallocate+0x48/0x70
   do_syscall_64+0x10a/0x480
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x496c7e
   </TASK>
  ------------[ cut here ]------------
  SEV: Failed to update RMP entry for PFN 0x78d198 error -14
  WARNING: arch/x86/kvm/svm/sev.c:5160 at sev_gmem_invalidate+0x126/0x170 [kvm_amd], CPU#3: sev_snp_vmsa_pu/31345
  CPU: 3 UID: 0 PID: 31345 Comm: sev_snp_vmsa_pu Tainted: G     U     O
  Tainted: [U]=USER, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
  RIP: 0010:sev_gmem_invalidate+0x12b/0x170 [kvm_amd]
  Call Trace:
   <TASK>
   delete_from_page_cache_batch+0x1d8/0x220
   truncate_inode_pages_range+0x120/0x3d0
   kvm_gmem_fallocate+0x19a/0x270 [kvm]
   vfs_fallocate+0x1bc/0x1f0
   __x64_sys_fallocate+0x48/0x70
   do_syscall_64+0x10a/0x480
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x496c7e
   </TASK>
  irq event stamp: 20689
  hardirqs last  enabled at (20699): [<ffffffff8e76092c>] __console_unlock+0x5c/0x60
  hardirqs last disabled at (20708): [<ffffffff8e760911>] __console_unlock+0x41/0x60
  softirqs last  enabled at (20722): [<ffffffff8e6cd74e>] __irq_exit_rcu+0x7e/0x140
  softirqs last disabled at (20717): [<ffffffff8e6cd74e>] __irq_exit_rcu+0x7e/0x140
  ---[ end trace 0000000000000000 ]---
  BUG: unable to handle page fault for address: ffff99a64d198000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x80000003) - RMP violation
  PGD 13eb001067 P4D 13eb001067 PUD 78d1d1063 PMD 1184e0063 PTE 800000078d198163
  SEV-SNP: PFN 0x78d198, RMP entry: [0x6030000000144001 - 0x000000000000000f]
  Oops: Oops: 0003 [#1] SMP
  CPU: 3 UID: 0 PID: 31407 Comm: highlanderd_hea Tainted: G     U  W  O
  Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
  RIP: 0010:prep_new_page+0x67/0x220
  Call Trace:
   <TASK>
   get_page_from_freelist+0x1c40/0x1c70
   __alloc_frozen_pages_noprof+0xca/0x1f0
   alloc_pages_mpol+0x10b/0x1b0
   alloc_pages_noprof+0x81/0x90
   pte_alloc_one+0x1b/0xd0
   do_pte_missing+0xdf/0x1020
   handle_mm_fault+0x7c7/0xb20
   do_user_addr_fault+0x268/0x6b0
   exc_page_fault+0x67/0xa0
   asm_exc_page_fault+0x26/0x30
  RIP: 0033:0x4a6b1e
   </TASK>
  gsmi: Log Shutdown Reason 0x03
  CR2: ffff99a64d198000
  ---[ end trace 0000000000000000 ]---
  RIP: 0010:prep_new_page+0x67/0x220

Drop the pseudo-TODO comment about needing to pin the page if guest_memfd
every supports migration, as integrating with invalidations events means
KVM will Just Work if/when page migration is ever supported (assuming SNP
hardware supports migrating VMSA pages).

Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Closes: https://lore.kernel.org/all/aimMWzAf5b3luM0b@v4bel
Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Cc: stable@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 +
 arch/x86/include/asm/kvm_host.h    |  4 ++
 arch/x86/kvm/mmu/mmu.c             |  5 ++
 arch/x86/kvm/svm/sev.c             | 79 +++++++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.c             |  2 +
 arch/x86/kvm/svm/svm.h             |  2 +
 arch/x86/kvm/x86.c                 |  6 +++
 include/linux/kvm_host.h           |  1 +
 virt/kvm/guest_memfd.c             |  4 ++
 9 files changed, 94 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index acae9f6d6c5e..deb3ded5796e 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -134,6 +134,7 @@ KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
 KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
 KVM_X86_OP_OPTIONAL(vm_move_enc_context_from)
 KVM_X86_OP_OPTIONAL(guest_memory_reclaimed)
+KVM_X86_OP_OPTIONAL(reload_vmsa)
 KVM_X86_OP(get_feature_msr)
 KVM_X86_OP(check_emulate_instruction)
 KVM_X86_OP(apic_init_signal_blocked)
@@ -148,6 +149,7 @@ KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
 KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+KVM_X86_OP_OPTIONAL(gmem_invalidate_range)
 KVM_X86_OP_OPTIONAL(gmem_reclaim_memory)
 #endif
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5e8603deb252..93af3bb82869 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -122,6 +122,8 @@
 	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VMSA_PAGE_RELOAD \
+	KVM_ARCH_REQ_FLAGS(33, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
 	KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
 
@@ -1878,6 +1880,7 @@ struct kvm_x86_ops {
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
 	int (*vm_move_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
 	void (*guest_memory_reclaimed)(struct kvm *kvm);
+	void (*reload_vmsa)(struct kvm_vcpu *vcpu);
 
 	int (*get_feature_msr)(u32 msr, u64 *data);
 
@@ -1902,6 +1905,7 @@ struct kvm_x86_ops {
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+	void (*gmem_invalidate_range)(struct kvm *kvm, struct kvm_gfn_range *range);
 	void (*gmem_reclaim_memory)(kvm_pfn_t start, kvm_pfn_t end);
 #endif
 	int (*gmem_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3eb1f86593b1..e2978e9a1731 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7589,6 +7589,11 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 
 	write_lock(&kvm->mmu_lock);
 
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+	if (slot->gmem.file)
+		kvm_arch_gmem_invalidate_range(kvm, &range);
+#endif
+
 	if (zap_all) {
 		__kvm_mmu_zap_all_fast_front_half(kvm);
 	} else {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4465e75494f2..2d2c159f20c2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3979,19 +3979,25 @@ static int snp_begin_psc(struct vcpu_svm *svm)
 	return snp_do_psc(svm);
 }
 
-static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
+static void __sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct kvm_memory_slot *slot;
+	struct kvm *kvm = vcpu->kvm;
 	gfn_t gfn = gpa_to_gfn(gpa);
+	unsigned long mmu_seq;
 	struct page *page;
 	kvm_pfn_t pfn;
 
 	lockdep_assert_held(&svm->sev_es.snp_vmsa_mutex);
 
-	/* Clear use of the VMSA. */
+	/*
+	 * Clear use of the VMSA.  Ensure snp_guest_vmsa_gpa is written exactly
+	 * once, as it is read locklessly when responding to gfn invalidations.
+	 * Pairs with the READ_ONCE() in sev_gmem_invalidate_range().
+	 */
 	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
-	svm->sev_es.snp_guest_vmsa_gpa = INVALID_PAGE;
+	WRITE_ONCE(svm->sev_es.snp_guest_vmsa_gpa, INVALID_PAGE);
 
 	/*
 	 * When replacing the VMSA during SEV-SNP AP creation,
@@ -4006,6 +4012,9 @@ static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	if (!slot)
 		return;
 
+	mmu_seq = kvm->mmu_invalidate_seq;
+	smp_rmb();
+
 	/*
 	 * The new VMSA will be private memory guest memory, so retrieve the
 	 * PFN from the gmem backend.
@@ -4024,15 +4033,20 @@ static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	 */
 	svm->sev_es.snp_has_guest_vmsa = true;
 
-	/* Use the new VMSA */
+	read_lock(&kvm->mmu_lock);
+	/*
+	 * Save the guest-provided GPA.  If retry is needed, then KVM will try
+	 * again with the same GPA.  If the VMSA is usable, then KVM needs to
+	 * track the GPA so that the VMSA can be reloaded if the backing page
+	 * for the GPA is invalidated.
+	 */
 	svm->sev_es.snp_guest_vmsa_gpa = gpa;
-	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
+	if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn))
+		kvm_make_request(KVM_REQ_VMSA_PAGE_RELOAD, vcpu);
+	else
+		svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
+	read_unlock(&kvm->mmu_lock);
 
-	/*
-	 * gmem pages aren't currently migratable, but if this ever changes
-	 * then care should be taken to ensure svm->sev_es.vmsa is pinned
-	 * through some other means.
-	 */
 	kvm_release_page_clean(page);
 }
 
@@ -4058,7 +4072,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 	gpa = svm->sev_es.snp_pending_vmsa_gpa;
 	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
 
-	sev_snp_reload_vmsa(vcpu, gpa);
+	__sev_snp_reload_vmsa(vcpu, gpa);
 
 	/*
 	 * Mark the vCPU as runnable for CREATE requests, indicated by a valid
@@ -4070,6 +4084,15 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
 		kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
 }
 
+void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_sev_es_state *sev_es = &to_svm(vcpu)->sev_es;
+
+	guard(mutex)(&sev_es->snp_vmsa_mutex);
+
+	__sev_snp_reload_vmsa(vcpu, sev_es->snp_guest_vmsa_gpa);
+}
+
 static int sev_snp_ap_creation(struct vcpu_svm *svm)
 {
 	struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
@@ -5135,6 +5158,40 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
 
 	return 0;
 }
+void sev_gmem_invalidate_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	/*
+	 * An unstable result for "is SNP" is a-ok here, thanks to mmu_lock.
+	 * The vCPU's VMSA GPA is invalidated before the vCPU is made visible
+	 * to other tasks, and can only become valid while holding mmu_lock,
+	 * after the VM is fully committed to being an SNP VM.
+	 */
+	if (!____sev_snp_guest(kvm))
+		return;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		/*
+		 * Read snp_guest_vmsa_gpa without taking the vCPU's VMSA mutex
+		 * (or its generic mutex) as mmu_lock is held, i.e. this task
+		 * can't sleep.  The VMSA is invalidated outside of mmu_lock,
+		 * but can only become valid inside of mmu_lock, i.e. the below
+		 * can get false positives, but not false negatives.  A false
+		 * positive is benign, as a spurious request simply forces the
+		 * vCPU to re-establish its VMSA.
+		 */
+		gpa_t gpa = READ_ONCE(to_svm(vcpu)->sev_es.snp_guest_vmsa_gpa);
+
+		if (VALID_PAGE(gpa) &&
+		    gpa_to_gfn(gpa) >= range->start &&
+		    gpa_to_gfn(gpa) < range->end)
+			kvm_make_request_and_kick(KVM_REQ_VMSA_PAGE_RELOAD, vcpu);
+	}
+}
 
 void sev_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end)
 {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6be0000ab386..9777b7fca79d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5445,6 +5445,7 @@ struct kvm_x86_ops svm_x86_ops __initdata = {
 	.mem_enc_register_region = sev_mem_enc_register_region,
 	.mem_enc_unregister_region = sev_mem_enc_unregister_region,
 	.guest_memory_reclaimed = sev_guest_memory_reclaimed,
+	.reload_vmsa = sev_snp_reload_vmsa,
 
 	.vm_copy_enc_context_from = sev_vm_copy_enc_context_from,
 	.vm_move_enc_context_from = sev_vm_move_enc_context_from,
@@ -5462,6 +5463,7 @@ struct kvm_x86_ops svm_x86_ops __initdata = {
 
 #ifdef CONFIG_KVM_AMD_SEV
 	.gmem_prepare = sev_gmem_prepare,
+	.gmem_invalidate_range = sev_gmem_invalidate_range,
 	.gmem_reclaim_memory = sev_gmem_reclaim_memory,
 	.gmem_max_mapping_level = sev_gmem_max_mapping_level,
 #endif
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index cf7c1a437f38..123e7bf687ef 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -996,6 +996,7 @@ static inline struct page *snp_safe_alloc_page(void)
 {
 	return snp_safe_alloc_page_node(numa_node_id(), GFP_KERNEL_ACCOUNT);
 }
+void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu);
 
 int sev_vcpu_create(struct kvm_vcpu *vcpu);
 void sev_free_vcpu(struct kvm_vcpu *vcpu);
@@ -1009,6 +1010,7 @@ int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
 extern unsigned int max_sev_asid;
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+void sev_gmem_invalidate_range(struct kvm *kvm, struct kvm_gfn_range *range);
 void sev_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end);
 int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
 struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bc0c3163f4a3..0bb50997c0e3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8170,6 +8170,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 				goto out;
 			}
 		}
+		if (kvm_check_request(KVM_REQ_VMSA_PAGE_RELOAD, vcpu))
+			kvm_x86_call(reload_vmsa)(vcpu);
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -10595,6 +10597,10 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_ord
 #endif
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+void kvm_arch_gmem_invalidate_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	kvm_x86_call(gmem_invalidate_range)(kvm, range);
+}
 void kvm_arch_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end)
 {
 	kvm_x86_call(gmem_reclaim_memory)(start, end);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d777eaadbcd2..20d376900b88 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2607,6 +2607,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src,
 #endif
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+void kvm_arch_gmem_invalidate_range(struct kvm *kvm, struct kvm_gfn_range *range);
 void kvm_arch_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end);
 #endif
 
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index db0fcc38b145..262ba77e6e83 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -185,6 +185,10 @@ static void __kvm_gmem_invalidate_start(struct gmem_file *f, pgoff_t start,
 		}
 
 		flush |= kvm_mmu_unmap_gfn_range(kvm, &gfn_range);
+
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
+		kvm_arch_gmem_invalidate_range(kvm, &gfn_range);
+#endif
 	}
 
 	if (flush)
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 11/12] KVM: x86: Guard .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE=y
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
                   ` (9 preceding siblings ...)
  2026-06-30 22:26 ` [PATCH v3 10/12] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped Sean Christopherson
@ 2026-06-30 22:26 ` Sean Christopherson
  2026-07-01 22:42   ` Michael Roth
  2026-06-30 22:26 ` [PATCH v3 12/12] KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid Sean Christopherson
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Wrap the .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE so that
non-SEV code doesn't try to wire up a callback without doing the necessary
enabling.

No functional change intended.

Fixes: 3bb2531e20bf ("KVM: guest_memfd: Add hook for initializing memory")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 4 +++-
 arch/x86/include/asm/kvm_host.h    | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index deb3ded5796e..39247d2f29d6 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -146,12 +146,14 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
-KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
+#endif
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 KVM_X86_OP_OPTIONAL(gmem_invalidate_range)
 KVM_X86_OP_OPTIONAL(gmem_reclaim_memory)
 #endif
+KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
 #endif
 
 #undef KVM_X86_OP
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 93af3bb82869..cf2ec19212ad 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1903,7 +1903,9 @@ struct kvm_x86_ops {
 
 	gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+#endif
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 	void (*gmem_invalidate_range)(struct kvm *kvm, struct kvm_gfn_range *range);
 	void (*gmem_reclaim_memory)(kvm_pfn_t start, kvm_pfn_t end);
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 12/12] KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid
  2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
                   ` (10 preceding siblings ...)
  2026-06-30 22:26 ` [PATCH v3 11/12] KVM: x86: Guard .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE=y Sean Christopherson
@ 2026-06-30 22:26 ` Sean Christopherson
  2026-07-01 22:47   ` Michael Roth
  11 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-06-30 22:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim, Tom Lendacky,
	Michael Roth, Jörg Rödel, Fuad Tabba

Track the guest as having a guest-provided VMSA as soon as control.vmsa_pa
is invalidated, instead of waiting to see if the guest-provided VMSA is
usable, so that KVM doesn't switch back to the original VMSA instead of
exiting to userspace (due to an invalid VMSA).  By the time a vCPU tries
to load a guest-provided VMSA, KVM has already communicated "success" for
AP creation, i.e. KVM has committed to using the guest-provided VMSA.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2d2c159f20c2..ec426a5582aa 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4005,23 +4005,6 @@ static void __sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	 */
 	vmcb_mark_all_dirty(svm->vmcb);
 
-	if (!VALID_PAGE(gpa))
-		return;
-
-	slot = gfn_to_memslot(vcpu->kvm, gfn);
-	if (!slot)
-		return;
-
-	mmu_seq = kvm->mmu_invalidate_seq;
-	smp_rmb();
-
-	/*
-	 * The new VMSA will be private memory guest memory, so retrieve the
-	 * PFN from the gmem backend.
-	 */
-	if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
-		return;
-
 	/*
 	 * From this point forward, the VMSA will always be a guest-mapped page
 	 * rather than the initial one allocated by KVM in svm->sev_es.vmsa. In
@@ -4033,6 +4016,23 @@ static void __sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	 */
 	svm->sev_es.snp_has_guest_vmsa = true;
 
+	if (!VALID_PAGE(gpa))
+		return;
+
+	slot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!slot)
+		return;
+
+	mmu_seq = kvm->mmu_invalidate_seq;
+	smp_rmb();
+
+	/*
+	 * The new VMSA will be private memory guest memory, so retrieve the
+	 * PFN from the gmem backend.
+	 */
+	if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
+		return;
+
 	read_lock(&kvm->mmu_lock);
 	/*
 	 * Save the guest-provided GPA.  If retry is needed, then KVM will try
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/12] KVM: Rename .gmem_invalidate() to .gmem_reclaim_memory()
  2026-06-30 22:25 ` [PATCH v3 04/12] KVM: Rename .gmem_invalidate() to .gmem_reclaim_memory() Sean Christopherson
@ 2026-06-30 22:39   ` sashiko-bot
  2026-07-01 19:41   ` Michael Roth
  1 sibling, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-06-30 22:39 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Failure to leak a page after an RMP transition error in sev_gmem_reclaim_memory allows hypervisor-owned pages to be unsafely returned to the host page allocator.
--

commit c2f5753d0289e9cb80f17e6494a0d03b0f6a529a
Author: Sean Christopherson <seanjc@google.com>

KVM: Rename .gmem_invalidate() to .gmem_reclaim_memory()

Rename the .gmem_invalidate() hook to .gmem_reclaim_memory() since it is
called when a folio is freed, which is too late to invalidate memory usage.
Keep guest_memfd's trampoline and opportunistically guard the callback with
an ifdef.

> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 30792adcfc8e5..4465e75494f2d 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -5136,7 +5136,7 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
>  	return 0;
>  }
>  
> -void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> +void sev_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end)
>  {

[Severity: Critical]
This is a pre-existing issue, but does failing to leak the page on RMP update
errors allow it to be unsafely returned to the host allocator?

In sev_gmem_reclaim_memory(), if rmp_make_shared() fails, the code logs a
warning and proceeds to the next PFN:

    rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
    if (WARN_ONCE(rc, "SEV: Failed to update RMP entry ..."))
        goto next_pfn;

By jumping to next_pfn instead of calling snp_leak_pages(), the
hypervisor-owned page could be freed back to the host allocator.

Could this trigger a hardware RMP violation and a host panic upon the page's
next use? A later patch in the series fixes a guest-triggerable vector for
this failure, but the structural issue remains.

>  	kvm_pfn_t pfn;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260630222607.497895-1-seanjc@google.com?part=4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 01/12] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests
  2026-06-30 22:25 ` [PATCH v3 01/12] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests Sean Christopherson
@ 2026-07-01 19:33   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 19:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:25:56PM -0700, Sean Christopherson wrote:
> Track the GPA of the guest-provided VMSA used after AP_CREATION events when
> running SNP guests, instead of simply tracking whether or not the vCPU is
> using a guest-provided VMSA.  KVM needs to know the GPA of the VMSA that's
> actively being used so that it can react to MMU invalidation events, i.e.
> so that KVM can drop the VMSA if its backing guest_memfd page is punched
> out of existence.
> 
> Opportunistically rename snp_vmsa_gpa to clarify that it tracks the pending
> VMSA GPA, whereas snp_guest_vmsa_gpa now tracks the in-use VMSA GPA.
> 
> Note!  Take care to track the GPA, not the GFN, as VALID_PAGE() won't
> behave correctly if an invalid GFN is converted to a GPA for checking.
> 
> Note #2!  Keep snp_has_guest_vmsa so that switching to a guest-provided
> VMSA is sticky, even if the guest-provided VMSA becomes invalid.
> 
> No functional change intended.
> 
> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/kvm/svm/sev.c | 14 +++++++++-----
>  arch/x86/kvm/svm/svm.h |  3 ++-
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 74fb15551e83..827f5dc06102 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4003,6 +4003,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
>  
>  	/* Clear use of the VMSA */
>  	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
> +	svm->sev_es.snp_guest_vmsa_gpa = INVALID_PAGE;
>  
>  	/*
>  	 * When replacing the VMSA during SEV-SNP AP creation,
> @@ -4010,11 +4011,11 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
>  	 */
>  	vmcb_mark_all_dirty(svm->vmcb);
>  
> -	if (!VALID_PAGE(svm->sev_es.snp_vmsa_gpa))
> +	if (!VALID_PAGE(svm->sev_es.snp_pending_vmsa_gpa))
>  		return;
>  
> -	gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
> -	svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
> +	gfn = gpa_to_gfn(svm->sev_es.snp_pending_vmsa_gpa);
> +	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
>  
>  	slot = gfn_to_memslot(vcpu->kvm, gfn);
>  	if (!slot)
> @@ -4039,6 +4040,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
>  	svm->sev_es.snp_has_guest_vmsa = true;
>  
>  	/* Use the new VMSA */
> +	svm->sev_es.snp_guest_vmsa_gpa = gfn_to_gpa(gfn);
>  	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
>  
>  	/* Mark the vCPU as runnable */
> @@ -4105,10 +4107,10 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
>  			return -EINVAL;
>  		}
>  
> -		target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
> +		target_svm->sev_es.snp_pending_vmsa_gpa = svm->vmcb->control.exit_info_2;
>  		break;
>  	case SVM_VMGEXIT_AP_DESTROY:
> -		target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
> +		target_svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
>  		break;
>  	default:
>  		vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
> @@ -4791,6 +4793,8 @@ int sev_vcpu_create(struct kvm_vcpu *vcpu)
>  		return -ENOMEM;
>  
>  	svm->sev_es.vmsa = page_address(vmsa_page);
> +	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
> +	svm->sev_es.snp_guest_vmsa_gpa = INVALID_PAGE;
>  
>  	vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm);
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 716be21fba33..d077783c287e 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -271,7 +271,8 @@ struct vcpu_sev_es_state {
>  	u64 ghcb_registered_gpa;
>  
>  	struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
> -	gpa_t snp_vmsa_gpa;
> +	gpa_t snp_pending_vmsa_gpa;
> +	gpa_t snp_guest_vmsa_gpa;
>  	bool snp_ap_waiting_for_reset;
>  	bool snp_has_guest_vmsa;
>  };
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 02/12] KVM: SEV: Extract loading of guest-provided VMSA to a separate helper
  2026-06-30 22:25 ` [PATCH v3 02/12] KVM: SEV: Extract loading of guest-provided VMSA to a separate helper Sean Christopherson
@ 2026-07-01 19:34   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 19:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:25:57PM -0700, Sean Christopherson wrote:
> Extract the loading/retrieval of a guest-provided VMSA to a separate helper
> so that KVM can reuse the core logic when refreshing the VMSA after an MMU
> invalidation from guest_memfd.
> 
> No functional change intended.
> 
> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/kvm/svm/sev.c | 52 +++++++++++++++++++++++++-----------------
>  1 file changed, 31 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 827f5dc06102..d8ed00f76aa3 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3979,29 +3979,17 @@ static int snp_begin_psc(struct vcpu_svm *svm)
>  	return snp_do_psc(svm);
>  }
>  
> -/*
> - * Invoked as part of svm_vcpu_reset() processing of an init event.
> - */
> -static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
> +static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	struct kvm_memory_slot *slot;
> +	gfn_t gfn = gpa_to_gfn(gpa);
>  	struct page *page;
>  	kvm_pfn_t pfn;
> -	gfn_t gfn;
>  
> -	guard(mutex)(&svm->sev_es.snp_vmsa_mutex);
> +	lockdep_assert_held(&svm->sev_es.snp_vmsa_mutex);
>  
> -	if (!svm->sev_es.snp_ap_waiting_for_reset)
> -		return;
> -
> -	svm->sev_es.snp_ap_waiting_for_reset = false;
> -
> -	/* Mark the vCPU as offline and not runnable */
> -	vcpu->arch.pv.pv_unhalted = false;
> -	kvm_set_mp_state(vcpu, KVM_MP_STATE_HALTED);
> -
> -	/* Clear use of the VMSA */
> +	/* Clear use of the VMSA. */
>  	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
>  	svm->sev_es.snp_guest_vmsa_gpa = INVALID_PAGE;
>  
> @@ -4011,12 +3999,9 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
>  	 */
>  	vmcb_mark_all_dirty(svm->vmcb);
>  
> -	if (!VALID_PAGE(svm->sev_es.snp_pending_vmsa_gpa))
> +	if (!VALID_PAGE(gpa))
>  		return;
>  
> -	gfn = gpa_to_gfn(svm->sev_es.snp_pending_vmsa_gpa);
> -	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
> -
>  	slot = gfn_to_memslot(vcpu->kvm, gfn);
>  	if (!slot)
>  		return;
> @@ -4040,7 +4025,7 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
>  	svm->sev_es.snp_has_guest_vmsa = true;
>  
>  	/* Use the new VMSA */
> -	svm->sev_es.snp_guest_vmsa_gpa = gfn_to_gpa(gfn);
> +	svm->sev_es.snp_guest_vmsa_gpa = gpa;
>  	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
>  
>  	/* Mark the vCPU as runnable */
> @@ -4054,6 +4039,31 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
>  	kvm_release_page_clean(page);
>  }
>  
> +/*
> + * Invoked as part of svm_vcpu_reset() processing of an init event.
> + */
> +static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	gpa_t gpa;
> +
> +	guard(mutex)(&svm->sev_es.snp_vmsa_mutex);
> +
> +	if (!svm->sev_es.snp_ap_waiting_for_reset)
> +		return;
> +
> +	svm->sev_es.snp_ap_waiting_for_reset = false;
> +
> +	/* Mark the vCPU as offline and not runnable */
> +	vcpu->arch.pv.pv_unhalted = false;
> +	kvm_set_mp_state(vcpu, KVM_MP_STATE_HALTED);
> +
> +	gpa = svm->sev_es.snp_pending_vmsa_gpa;
> +	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
> +
> +	sev_snp_reload_vmsa(vcpu, gpa);
> +}
> +
>  static int sev_snp_ap_creation(struct vcpu_svm *svm)
>  {
>  	struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 03/12] KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable
  2026-06-30 22:25 ` [PATCH v3 03/12] KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable Sean Christopherson
@ 2026-07-01 19:36   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:25:58PM -0700, Sean Christopherson wrote:
> Always mark the vCPU as RUNNABLE after responding to AP_CREATE, even if the
> guest-specified VMSA is unusable, e.g. isn't backed by a memslot or doesn't
> have a backing guest_memfd page.  If the VMSA is unusable, leaving the vCPU
> in a non-running state will effectively hang the vCPU instead of reporting
> an error to userspace.  This will also allow retrying the VMSA load in the
> future, to fix a bug where KVM doesn't honor guest_memfd invalidation
> events, e.g. if AP_CREATION races with PUNCH_HOLE.
> 
> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/kvm/svm/sev.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index d8ed00f76aa3..30792adcfc8e 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4028,9 +4028,6 @@ static void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
>  	svm->sev_es.snp_guest_vmsa_gpa = gpa;
>  	svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
>  
> -	/* Mark the vCPU as runnable */
> -	kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
> -
>  	/*
>  	 * gmem pages aren't currently migratable, but if this ever changes
>  	 * then care should be taken to ensure svm->sev_es.vmsa is pinned
> @@ -4062,6 +4059,15 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
>  	svm->sev_es.snp_pending_vmsa_gpa = INVALID_PAGE;
>  
>  	sev_snp_reload_vmsa(vcpu, gpa);
> +
> +	/*
> +	 * Mark the vCPU as runnable for CREATE requests, indicated by a valid
> +	 * VMSA GPA, even if installing the VMSA failed, so that KVM_RUN will
> +	 * fail instead of blocking indefinitely and hanging the vCPU, e.g. if
> +	 * the backing guest_memfd page is unavailable.
> +	 */
> +	if (VALID_PAGE(gpa))
> +		kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
>  }
>  
>  static int sev_snp_ap_creation(struct vcpu_svm *svm)
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/12] KVM: Rename .gmem_invalidate() to .gmem_reclaim_memory()
  2026-06-30 22:25 ` [PATCH v3 04/12] KVM: Rename .gmem_invalidate() to .gmem_reclaim_memory() Sean Christopherson
  2026-06-30 22:39   ` sashiko-bot
@ 2026-07-01 19:41   ` Michael Roth
  1 sibling, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 19:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:25:59PM -0700, Sean Christopherson wrote:
> Rename .gmem_invalidate() to .gmem_reclaim_memory() as the hook is called
> when a folio is freed, which is far too late and lacks sufficient
> information for KVM to actually invalidate its usage of the memory.
> 
> Keep guest_memfd's trampoline, even though it would be trivial to wire up
> .free_folio() directly to an arch callback, to avoid bleeding guest_memfd
> internals into arch code (specifically, avoid referencing folios in arch
> code).
> 
> Opportunistically guard kvm_x86_ops.gmem_reclaim_memory() with an ifdef to
> ensure the callback will actually be called, e.g. so that non-SEV code
> doesn't try to wire up a callback without enabling
> CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE.
> 
> No functional change intended.
> 
> Cc: Ackerley Tng <ackerleytng@google.com>
> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/include/asm/kvm-x86-ops.h | 4 +++-
>  arch/x86/include/asm/kvm_host.h    | 4 +++-
>  arch/x86/kvm/svm/sev.c             | 2 +-
>  arch/x86/kvm/svm/svm.c             | 4 +++-
>  arch/x86/kvm/svm/svm.h             | 3 +--
>  arch/x86/kvm/x86.c                 | 4 ++--
>  include/linux/kvm_host.h           | 2 +-
>  virt/kvm/guest_memfd.c             | 2 +-
>  8 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 83dc5086138b..acae9f6d6c5e 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -147,7 +147,9 @@ KVM_X86_OP_OPTIONAL(get_untagged_addr)
>  KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
>  KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
>  KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
> -KVM_X86_OP_OPTIONAL(gmem_invalidate)
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> +KVM_X86_OP_OPTIONAL(gmem_reclaim_memory)
> +#endif
>  #endif
>  
>  #undef KVM_X86_OP
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b517257a6315..5e8603deb252 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1901,7 +1901,9 @@ struct kvm_x86_ops {
>  	gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
>  	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
>  	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
> -	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> +	void (*gmem_reclaim_memory)(kvm_pfn_t start, kvm_pfn_t end);
> +#endif
>  	int (*gmem_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
>  };
>  
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 30792adcfc8e..4465e75494f2 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -5136,7 +5136,7 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
>  	return 0;
>  }
>  
> -void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> +void sev_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end)
>  {
>  	kvm_pfn_t pfn;
>  
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ef69a51ab27f..6be0000ab386 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5460,9 +5460,11 @@ struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
>  	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
>  
> +#ifdef CONFIG_KVM_AMD_SEV
>  	.gmem_prepare = sev_gmem_prepare,
> -	.gmem_invalidate = sev_gmem_invalidate,
> +	.gmem_reclaim_memory = sev_gmem_reclaim_memory,
>  	.gmem_max_mapping_level = sev_gmem_max_mapping_level,
> +#endif
>  };
>  
>  /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index d077783c287e..cf7c1a437f38 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -1009,7 +1009,7 @@ int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
>  extern unsigned int max_sev_asid;
>  void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
>  int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
> -void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
> +void sev_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end);
>  int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
>  struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
>  void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa);
> @@ -1039,7 +1039,6 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
>  {
>  	return 0;
>  }
> -static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
>  static inline int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private)
>  {
>  	return 0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0626e835e9eb..6b9a1b0b1460 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10592,9 +10592,9 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_ord
>  #endif
>  
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> -void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> +void kvm_arch_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end)
>  {
> -	kvm_x86_call(gmem_invalidate)(start, end);
> +	kvm_x86_call(gmem_reclaim_memory)(start, end);
>  }
>  #endif
>  #endif
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ab8cfaec82d3..d777eaadbcd2 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2607,7 +2607,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src,
>  #endif
>  
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> -void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
> +void kvm_arch_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end);
>  #endif
>  
>  #ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 86690683b2fe..db0fcc38b145 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -530,7 +530,7 @@ static void kvm_gmem_free_folio(struct folio *folio)
>  	kvm_pfn_t pfn = page_to_pfn(page);
>  	int order = folio_order(folio);
>  
> -	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
> +	kvm_arch_gmem_reclaim_memory(pfn, pfn + (1ul << order));
>  }
>  #endif
>  
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 10/12] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped
  2026-06-30 22:26 ` [PATCH v3 10/12] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped Sean Christopherson
@ 2026-07-01 21:56   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 21:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:26:05PM -0700, Sean Christopherson wrote:
> Wire up a gmem_invalid_range() call for SNP VMs, and use it to force vCPUs
> to reload/recheck their guest-provided VMSA if the backing guest_memfd page
> is being invalidated, e.g. is being PUNCH_HOLE'd.  Use the same core logic
> to handle invalidations as VMX does for the APIC-access page, as the two
> concepts are nearly identical: shove the physical address of a page into
> the vCPU's control structure:
> 
>  1. Snapshot the invalidation sequence counter
>  2. Grab the pfn (from guest_memfd in this case)
>  3. Acquire mmu_lock for read
>  4. Re-request reload if retry is needed, otherwise commit the change.
> 
> Note, the re-request action in #4 is necessary as KVM's retry logic is
> fuzzy, i.e. can get false positives.  If the guest_memfd page has been
> dropped, at some point a subsequent reload will fail to get a PFN from
> guest_memfd, and KVM will fail KVM_RUN.  If the retry was due to a false
> positive, KVM will retry until there are no relevant MMU notifier events
> (and will retry in the "outer" loop, i.e. will drop locks and resched as
> needed).
> 
> Note #2!  Take care to invalidate the VMSA when a relevant memslot is
> DELETED or MOVED, as invalidations in response to PUNCH_HOLE are predicated
> on memslot bindings (KVM doesn't know what GFN range(s) to invalidate
> without a binding).  And more importantly, the VMSA mapping requires a
> memslot, i.e. must be invalidated if its memslots disappears, regardless of
> the state of the underlying guest_memfd inode.
> 
> Failure to invalidate the vCPU's control.vmsa_pa (which is checked by
> pre_sev_run()) can prevent KVM from properly freeing the page as firmware
> will reject the RMPUPDATE to reclaim the page with FAIL_INUSE if the vCPU
> is actively running, i.e. if VMSA page is in-use.  That in turn leads to an
> RMP #PF on the next use, as the page will still be assigned to the SNP VM.
> 
>   SEV-SNP: RMPUPDATE failed for PFN 78d198, pg_level: 1, ret: 3
>   SEV-SNP: PFN 0x78d198, RMP entry: [0xfff0000000144001 - 0x000000000000000f]
>   CPU: 3 UID: 0 PID: 31345 Comm: sev_snp_vmsa_pu Tainted: G     U     O
>   Tainted: [U]=USER, [O]=OOT_MODULE
>   Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x54/0x70
>    rmpupdate+0x12c/0x140
>    rmp_make_shared+0x3b/0x60
>    sev_gmem_invalidate+0xe0/0x170 [kvm_amd]
>    delete_from_page_cache_batch+0x1d8/0x220
>    truncate_inode_pages_range+0x120/0x3d0
>    kvm_gmem_fallocate+0x19a/0x270 [kvm]
>    vfs_fallocate+0x1bc/0x1f0
>    __x64_sys_fallocate+0x48/0x70
>    do_syscall_64+0x10a/0x480
>    entry_SYSCALL_64_after_hwframe+0x4b/0x53
>   RIP: 0033:0x496c7e
>    </TASK>
>   ------------[ cut here ]------------
>   SEV: Failed to update RMP entry for PFN 0x78d198 error -14
>   WARNING: arch/x86/kvm/svm/sev.c:5160 at sev_gmem_invalidate+0x126/0x170 [kvm_amd], CPU#3: sev_snp_vmsa_pu/31345
>   CPU: 3 UID: 0 PID: 31345 Comm: sev_snp_vmsa_pu Tainted: G     U     O
>   Tainted: [U]=USER, [O]=OOT_MODULE
>   Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
>   RIP: 0010:sev_gmem_invalidate+0x12b/0x170 [kvm_amd]
>   Call Trace:
>    <TASK>
>    delete_from_page_cache_batch+0x1d8/0x220
>    truncate_inode_pages_range+0x120/0x3d0
>    kvm_gmem_fallocate+0x19a/0x270 [kvm]
>    vfs_fallocate+0x1bc/0x1f0
>    __x64_sys_fallocate+0x48/0x70
>    do_syscall_64+0x10a/0x480
>    entry_SYSCALL_64_after_hwframe+0x4b/0x53
>   RIP: 0033:0x496c7e
>    </TASK>
>   irq event stamp: 20689
>   hardirqs last  enabled at (20699): [<ffffffff8e76092c>] __console_unlock+0x5c/0x60
>   hardirqs last disabled at (20708): [<ffffffff8e760911>] __console_unlock+0x41/0x60
>   softirqs last  enabled at (20722): [<ffffffff8e6cd74e>] __irq_exit_rcu+0x7e/0x140
>   softirqs last disabled at (20717): [<ffffffff8e6cd74e>] __irq_exit_rcu+0x7e/0x140
>   ---[ end trace 0000000000000000 ]---
>   BUG: unable to handle page fault for address: ffff99a64d198000
>   #PF: supervisor write access in kernel mode
>   #PF: error_code(0x80000003) - RMP violation
>   PGD 13eb001067 P4D 13eb001067 PUD 78d1d1063 PMD 1184e0063 PTE 800000078d198163
>   SEV-SNP: PFN 0x78d198, RMP entry: [0x6030000000144001 - 0x000000000000000f]
>   Oops: Oops: 0003 [#1] SMP
>   CPU: 3 UID: 0 PID: 31407 Comm: highlanderd_hea Tainted: G     U  W  O
>   Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
>   Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
>   RIP: 0010:prep_new_page+0x67/0x220
>   Call Trace:
>    <TASK>
>    get_page_from_freelist+0x1c40/0x1c70
>    __alloc_frozen_pages_noprof+0xca/0x1f0
>    alloc_pages_mpol+0x10b/0x1b0
>    alloc_pages_noprof+0x81/0x90
>    pte_alloc_one+0x1b/0xd0
>    do_pte_missing+0xdf/0x1020
>    handle_mm_fault+0x7c7/0xb20
>    do_user_addr_fault+0x268/0x6b0
>    exc_page_fault+0x67/0xa0
>    asm_exc_page_fault+0x26/0x30
>   RIP: 0033:0x4a6b1e
>    </TASK>
>   gsmi: Log Shutdown Reason 0x03
>   CR2: ffff99a64d198000
>   ---[ end trace 0000000000000000 ]---
>   RIP: 0010:prep_new_page+0x67/0x220
> 
> Drop the pseudo-TODO comment about needing to pin the page if guest_memfd
> every supports migration, as integrating with invalidations events means
> KVM will Just Work if/when page migration is ever supported (assuming SNP
> hardware supports migrating VMSA pages).
> 
> Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
> Closes: https://lore.kernel.org/all/aimMWzAf5b3luM0b@v4bel
> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
> Cc: stable@vger.kernel.org
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Michael Roth <michael.roth@amd.com>
> Cc: Jörg Rödel <joro@8bytes.org>
> Cc: Fuad Tabba <tabba@google.com>
> Cc: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

Small/optional nit below:

> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  2 +
>  arch/x86/include/asm/kvm_host.h    |  4 ++
>  arch/x86/kvm/mmu/mmu.c             |  5 ++
>  arch/x86/kvm/svm/sev.c             | 79 +++++++++++++++++++++++++-----
>  arch/x86/kvm/svm/svm.c             |  2 +
>  arch/x86/kvm/svm/svm.h             |  2 +
>  arch/x86/kvm/x86.c                 |  6 +++
>  include/linux/kvm_host.h           |  1 +
>  virt/kvm/guest_memfd.c             |  4 ++
>  9 files changed, 94 insertions(+), 11 deletions(-)
> 
...
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index cf7c1a437f38..123e7bf687ef 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -996,6 +996,7 @@ static inline struct page *snp_safe_alloc_page(void)
>  {
>  	return snp_safe_alloc_page_node(numa_node_id(), GFP_KERNEL_ACCOUNT);
>  }
> +void sev_snp_reload_vmsa(struct kvm_vcpu *vcpu);
>  
>  int sev_vcpu_create(struct kvm_vcpu *vcpu);
>  void sev_free_vcpu(struct kvm_vcpu *vcpu);
> @@ -1009,6 +1010,7 @@ int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
>  extern unsigned int max_sev_asid;
>  void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
>  int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
> +void sev_gmem_invalidate_range(struct kvm *kvm, struct kvm_gfn_range *range);
>  void sev_gmem_reclaim_memory(kvm_pfn_t start, kvm_pfn_t end);
>  int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
>  struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bc0c3163f4a3..0bb50997c0e3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8170,6 +8170,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  				goto out;
>  			}
>  		}
> +		if (kvm_check_request(KVM_REQ_VMSA_PAGE_RELOAD, vcpu))
> +			kvm_x86_call(reload_vmsa)(vcpu);

VMSA is SVM/SEV-specific, and while the event/handling might be SEV-specific,
would it make sense to make the kvm_x86_op generic at least?

'reload_guest_save_area' maybe?

Thanks,

Mike

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 05/12] KVM: x86: Serialize writes to disabled_quirks using kvm->lock
  2026-06-30 22:26 ` [PATCH v3 05/12] KVM: x86: Serialize writes to disabled_quirks using kvm->lock Sean Christopherson
@ 2026-07-01 21:59   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 21:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:26:00PM -0700, Sean Christopherson wrote:
> Protect writes to disabled_quirks with kvm->lock to ensure KVM doesn't
> clobber state in the unlikely scenario that userspace disables disparate
> quirks from multiple tasks.  More importantly, this will allow wrapping
> accesses with {READ,WRITE}_ONCE without "needing" to also guard the writer
> with a useless and confusing READ_ONCE (since the RMW wouldn't be atomic
> anyways).
> 
> Ideally, KVM would disallow disabling quirks once quirks are "live", but
> that would be a potentially breaking userspace ABI change, and while all
> existing quirks are fully live only after vCPUs have been created, several
> MMU-related quirks, IGNORE_GUEST_PAT and SLOT_ZAP_ALL, are partially live
> at all times.  Because populating MMUs requires a vCPU, the guest-visible
> behavior of IGNORE_GUEST_PAT and SLOT_ZAP_ALL requires a vCPU, but for KVM
> itself, processing the quirk (or not) has functional impact, i.e. for all
> intents and purposes, KVM can't prevent those quirks from being disabled
> after they've been consumed.
> 
> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/kvm/x86.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6b9a1b0b1460..74f1d7169218 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3939,7 +3939,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  			break;
>  		fallthrough;
>  	case KVM_CAP_DISABLE_QUIRKS:
> +		mutex_lock(&kvm->lock);
>  		kvm->arch.disabled_quirks |= cap->args[0] & kvm_caps.supported_quirks;
> +		mutex_unlock(&kvm->lock);
>  		r = 0;
>  		break;
>  	case KVM_CAP_SPLIT_IRQCHIP: {
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 06/12] KVM: x86: Ensure runtime reads of disabled_quirks are resolved once
  2026-06-30 22:26 ` [PATCH v3 06/12] KVM: x86: Ensure runtime reads of disabled_quirks are resolved once Sean Christopherson
@ 2026-07-01 22:00   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 22:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:26:01PM -0700, Sean Christopherson wrote:
> Wrap the sole reader of disabled_quirks with READ_ONCE(), and wrap the
> post-VM-creation write to disabled_quirks with WRITE_ONCE(), to ensure
> checking the status of a quirk doesn't re-read disabled_quirks *if* the
> caller needs such a guarantee.  This will allow splitting the "fast" MMU
> zap into front and back halves, without potentially skipping the back
> half if SLOT_ZAP_ALL were concurrently disabled (which would be "fine" in
> the current code base, but far from ideal).
> 
> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/kvm/x86.c | 3 ++-
>  arch/x86/kvm/x86.h | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 74f1d7169218..bc0c3163f4a3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3940,7 +3940,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		fallthrough;
>  	case KVM_CAP_DISABLE_QUIRKS:
>  		mutex_lock(&kvm->lock);
> -		kvm->arch.disabled_quirks |= cap->args[0] & kvm_caps.supported_quirks;
> +		WRITE_ONCE(kvm->arch.disabled_quirks,
> +			   kvm->arch.disabled_quirks | (cap->args[0] & kvm_caps.supported_quirks));
>  		mutex_unlock(&kvm->lock);
>  		r = 0;
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 8ece468087a8..75f13d88db58 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -304,7 +304,7 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
>  
>  static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
>  {
> -	return !(kvm->arch.disabled_quirks & quirk);
> +	return !(READ_ONCE(kvm->arch.disabled_quirks) & quirk);
>  }
>  
>  static __always_inline void kvm_request_l1tf_flush_l1d(void)
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 07/12] KVM: x86/mmu: Fold kvm_mmu_zap_memslot() into kvm_arch_flush_shadow_memslot()
  2026-06-30 22:26 ` [PATCH v3 07/12] KVM: x86/mmu: Fold kvm_mmu_zap_memslot() into kvm_arch_flush_shadow_memslot() Sean Christopherson
@ 2026-07-01 22:04   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 22:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:26:02PM -0700, Sean Christopherson wrote:
> Fold kvm_mmu_zap_memslot() into its sole caller so that its GFN range
> structure can be used to trigger guest_memfd invalidations regardless of
> whether KVM will do a partial or full zap of the MMU.
> 
> No functional change intended.
> 
> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/kvm/mmu/mmu.c | 35 +++++++++++++++--------------------
>  1 file changed, 15 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 6c13da942bfc..223d80b12b9b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7560,8 +7560,14 @@ static void kvm_mmu_zap_memslot_pages_and_flush(struct kvm *kvm,
>  	kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
>  }
>  
> -static void kvm_mmu_zap_memslot(struct kvm *kvm,
> -				struct kvm_memory_slot *slot)
> +static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
> +{
> +	return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
> +	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
> +}
> +
> +void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> +				   struct kvm_memory_slot *slot)
>  {
>  	struct kvm_gfn_range range = {
>  		.slot = slot,
> @@ -7572,25 +7578,14 @@ static void kvm_mmu_zap_memslot(struct kvm *kvm,
>  	};
>  	bool flush;
>  
> -	write_lock(&kvm->mmu_lock);
> -	flush = kvm_unmap_gfn_range(kvm, &range);
> -	kvm_mmu_zap_memslot_pages_and_flush(kvm, slot, flush);
> -	write_unlock(&kvm->mmu_lock);
> -}
> -
> -static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
> -{
> -	return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
> -	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
> -}
> -
> -void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> -				   struct kvm_memory_slot *slot)
> -{
> -	if (kvm_memslot_flush_zap_all(kvm))
> +	if (kvm_memslot_flush_zap_all(kvm)) {
>  		kvm_mmu_zap_all_fast(kvm);
> -	else
> -		kvm_mmu_zap_memslot(kvm, slot);
> +	} else {
> +		write_lock(&kvm->mmu_lock);
> +		flush = kvm_unmap_gfn_range(kvm, &range);
> +		kvm_mmu_zap_memslot_pages_and_flush(kvm, slot, flush);
> +		write_unlock(&kvm->mmu_lock);
> +	}
>  }
>  
>  void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 08/12] KVM: x86/mmu: Split kvm_mmu_zap_all_fast() into "front" and "back" halves
  2026-06-30 22:26 ` [PATCH v3 08/12] KVM: x86/mmu: Split kvm_mmu_zap_all_fast() into "front" and "back" halves Sean Christopherson
@ 2026-07-01 22:07   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 22:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:26:03PM -0700, Sean Christopherson wrote:
> Split kvm_mmu_zap_all_fast() into a "front half" and a "back half", where
> the front half is everything that runs with mmu_lock held for write, and
> the back half is the code that runs outside of mmu_lock.  This will allow
> putting more code inside kvm_arch_flush_shadow_memslot()'s critical section
> without having to take mmu_lock twice in quick succession.
> 
> No functional change intended.
> 
> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/kvm/mmu/mmu.c | 37 +++++++++++++++++++++++++------------
>  1 file changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 223d80b12b9b..a5c2a560a88a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6921,20 +6921,11 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
>  	kvm_mmu_commit_zap_page(kvm, &invalid_list);
>  }
>  
> -/*
> - * Fast invalidate all shadow pages and use lock-break technique
> - * to zap obsolete pages.
> - *
> - * It's required when memslot is being deleted or VM is being
> - * destroyed, in these cases, we should ensure that KVM MMU does
> - * not use any resource of the being-deleted slot or all slots
> - * after calling the function.
> - */
> -static void kvm_mmu_zap_all_fast(struct kvm *kvm)
> +static void __kvm_mmu_zap_all_fast_front_half(struct kvm *kvm)
>  {
>  	lockdep_assert_held(&kvm->slots_lock);
> +	lockdep_assert_held_write(&kvm->mmu_lock);
>  
> -	write_lock(&kvm->mmu_lock);
>  	trace_kvm_mmu_zap_all_fast(kvm);
>  
>  	/*
> @@ -6971,8 +6962,12 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
>  	kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
>  
>  	kvm_zap_obsolete_pages(kvm);
> +}
>  
> -	write_unlock(&kvm->mmu_lock);
> +static void __kvm_mmu_zap_all_fast_back_half(struct kvm *kvm)
> +{
> +	lockdep_assert_held(&kvm->slots_lock);
> +	lockdep_assert_not_held(&kvm->mmu_lock);
>  
>  	/*
>  	 * Zap the invalidated TDP MMU roots, all SPTEs must be dropped before
> @@ -6986,6 +6981,24 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
>  		kvm_tdp_mmu_zap_invalidated_roots(kvm, true);
>  }
>  
> +/*
> + * Fast invalidate all shadow pages and use lock-break technique
> + * to zap obsolete pages.
> + *
> + * It's required when memslot is being deleted or VM is being
> + * destroyed, in these cases, we should ensure that KVM MMU does
> + * not use any resource of the being-deleted slot or all slots
> + * after calling the function.
> + */
> +static void kvm_mmu_zap_all_fast(struct kvm *kvm)
> +{
> +	write_lock(&kvm->mmu_lock);
> +	__kvm_mmu_zap_all_fast_front_half(kvm);
> +	write_unlock(&kvm->mmu_lock);
> +
> +	__kvm_mmu_zap_all_fast_back_half(kvm);
> +}
> +
>  int kvm_mmu_init_vm(struct kvm *kvm)
>  {
>  	int r, i;
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 09/12] KVM: x86/mmu: Use split "zap all fast" helpers when invalidating memslot
  2026-06-30 22:26 ` [PATCH v3 09/12] KVM: x86/mmu: Use split "zap all fast" helpers when invalidating memslot Sean Christopherson
@ 2026-07-01 22:19   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 22:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:26:04PM -0700, Sean Christopherson wrote:
> Manually invoke the front half and back half of the "zap all fast" flow
> when invalidating a memslot so that mmu_lock is acquired at function scope
> in kvm_arch_flush_shadow_memslot().   This will allow putting more code
> inside the critical section without having to take mmu_lock twice in quick
> succession.
> 
> Opportunistically open code checking whether or not to do the fast zap, to
> discourage removing the local "zap_all" in a future cleanup, i.e. to ensure
> the SLOT_ZAP_ALL quirk is queried exactly once.  Processing the front half

Enforcement through finger punishment? That sounds surprisingly effective :)

> but not the back half of the fast zap (if SLOT_ZAP_ALL were disabled
> concurrently) would result in KVM unnecessarily keeping invalid TDP MMU
> roots until the VM is destroyed.
> 
> No functional change intended.
> 
> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/kvm/mmu/mmu.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a5c2a560a88a..3eb1f86593b1 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7573,12 +7573,6 @@ static void kvm_mmu_zap_memslot_pages_and_flush(struct kvm *kvm,
>  	kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
>  }
>  
> -static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
> -{
> -	return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
> -	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
> -}
> -
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> @@ -7589,16 +7583,23 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  		.may_block = true,
>  		.attr_filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED,
>  	};
> +	bool zap_all = kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
> +		       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
>  	bool flush;
>  
> -	if (kvm_memslot_flush_zap_all(kvm)) {
> -		kvm_mmu_zap_all_fast(kvm);
> +	write_lock(&kvm->mmu_lock);
> +
> +	if (zap_all) {
> +		__kvm_mmu_zap_all_fast_front_half(kvm);
>  	} else {
> -		write_lock(&kvm->mmu_lock);
>  		flush = kvm_unmap_gfn_range(kvm, &range);
>  		kvm_mmu_zap_memslot_pages_and_flush(kvm, slot, flush);
> -		write_unlock(&kvm->mmu_lock);
>  	}
> +
> +	write_unlock(&kvm->mmu_lock);
> +
> +	if (zap_all)
> +		__kvm_mmu_zap_all_fast_back_half(kvm);
>  }
>  
>  void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 11/12] KVM: x86: Guard .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE=y
  2026-06-30 22:26 ` [PATCH v3 11/12] KVM: x86: Guard .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE=y Sean Christopherson
@ 2026-07-01 22:42   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 22:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:26:06PM -0700, Sean Christopherson wrote:
> Wrap the .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE so that
> non-SEV code doesn't try to wire up a callback without doing the necessary
> enabling.
> 
> No functional change intended.
> 
> Fixes: 3bb2531e20bf ("KVM: guest_memfd: Add hook for initializing memory")
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

Not sure about the need for the Fixes tag, but it certainly guards against
some future problems.

> ---
>  arch/x86/include/asm/kvm-x86-ops.h | 4 +++-
>  arch/x86/include/asm/kvm_host.h    | 2 ++
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index deb3ded5796e..39247d2f29d6 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -146,12 +146,14 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
>  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
>  KVM_X86_OP_OPTIONAL(get_untagged_addr)
>  KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
>  KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
> -KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
> +#endif
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>  KVM_X86_OP_OPTIONAL(gmem_invalidate_range)
>  KVM_X86_OP_OPTIONAL(gmem_reclaim_memory)
>  #endif
> +KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
>  #endif
>  
>  #undef KVM_X86_OP
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 93af3bb82869..cf2ec19212ad 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1903,7 +1903,9 @@ struct kvm_x86_ops {
>  
>  	gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
>  	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
>  	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
> +#endif
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>  	void (*gmem_invalidate_range)(struct kvm *kvm, struct kvm_gfn_range *range);
>  	void (*gmem_reclaim_memory)(kvm_pfn_t start, kvm_pfn_t end);
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 12/12] KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid
  2026-06-30 22:26 ` [PATCH v3 12/12] KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid Sean Christopherson
@ 2026-07-01 22:47   ` Michael Roth
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Roth @ 2026-07-01 22:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Ackerley Tng, Hyunwoo Kim,
	Tom Lendacky, Jörg Rödel, Fuad Tabba

On Tue, Jun 30, 2026 at 03:26:07PM -0700, Sean Christopherson wrote:
> Track the guest as having a guest-provided VMSA as soon as control.vmsa_pa
> is invalidated, instead of waiting to see if the guest-provided VMSA is
> usable, so that KVM doesn't switch back to the original VMSA instead of
> exiting to userspace (due to an invalid VMSA).  By the time a vCPU tries
> to load a guest-provided VMSA, KVM has already communicated "success" for
> AP creation, i.e. KVM has committed to using the guest-provided VMSA.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/kvm/svm/sev.c | 34 +++++++++++++++++-----------------
>  1 file changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2d2c159f20c2..ec426a5582aa 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4005,23 +4005,6 @@ static void __sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
>  	 */
>  	vmcb_mark_all_dirty(svm->vmcb);
>  
> -	if (!VALID_PAGE(gpa))
> -		return;
> -
> -	slot = gfn_to_memslot(vcpu->kvm, gfn);
> -	if (!slot)
> -		return;
> -
> -	mmu_seq = kvm->mmu_invalidate_seq;
> -	smp_rmb();
> -
> -	/*
> -	 * The new VMSA will be private memory guest memory, so retrieve the
> -	 * PFN from the gmem backend.
> -	 */
> -	if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
> -		return;
> -
>  	/*
>  	 * From this point forward, the VMSA will always be a guest-mapped page
>  	 * rather than the initial one allocated by KVM in svm->sev_es.vmsa. In
> @@ -4033,6 +4016,23 @@ static void __sev_snp_reload_vmsa(struct kvm_vcpu *vcpu, gpa_t gpa)
>  	 */
>  	svm->sev_es.snp_has_guest_vmsa = true;
>  
> +	if (!VALID_PAGE(gpa))
> +		return;
> +
> +	slot = gfn_to_memslot(vcpu->kvm, gfn);
> +	if (!slot)
> +		return;
> +
> +	mmu_seq = kvm->mmu_invalidate_seq;
> +	smp_rmb();
> +
> +	/*
> +	 * The new VMSA will be private memory guest memory, so retrieve the
> +	 * PFN from the gmem backend.
> +	 */
> +	if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
> +		return;
> +
>  	read_lock(&kvm->mmu_lock);
>  	/*
>  	 * Save the guest-provided GPA.  If retry is needed, then KVM will try
> -- 
> 2.55.0.rc0.799.gd6f94ed593-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-07-01 22:52 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-30 22:25 [PATCH v3 00/12] KVM: SEV: Fix RMP #PF due to freeing in-use VMSA Sean Christopherson
2026-06-30 22:25 ` [PATCH v3 01/12] KVM: SEV: Track the GPA of the guest-controlled VMSA used for SNP guests Sean Christopherson
2026-07-01 19:33   ` Michael Roth
2026-06-30 22:25 ` [PATCH v3 02/12] KVM: SEV: Extract loading of guest-provided VMSA to a separate helper Sean Christopherson
2026-07-01 19:34   ` Michael Roth
2026-06-30 22:25 ` [PATCH v3 03/12] KVM: SEV: Mark vCPU RUNNABLE after AP_CREATE, even if VMSA is unusable Sean Christopherson
2026-07-01 19:36   ` Michael Roth
2026-06-30 22:25 ` [PATCH v3 04/12] KVM: Rename .gmem_invalidate() to .gmem_reclaim_memory() Sean Christopherson
2026-06-30 22:39   ` sashiko-bot
2026-07-01 19:41   ` Michael Roth
2026-06-30 22:26 ` [PATCH v3 05/12] KVM: x86: Serialize writes to disabled_quirks using kvm->lock Sean Christopherson
2026-07-01 21:59   ` Michael Roth
2026-06-30 22:26 ` [PATCH v3 06/12] KVM: x86: Ensure runtime reads of disabled_quirks are resolved once Sean Christopherson
2026-07-01 22:00   ` Michael Roth
2026-06-30 22:26 ` [PATCH v3 07/12] KVM: x86/mmu: Fold kvm_mmu_zap_memslot() into kvm_arch_flush_shadow_memslot() Sean Christopherson
2026-07-01 22:04   ` Michael Roth
2026-06-30 22:26 ` [PATCH v3 08/12] KVM: x86/mmu: Split kvm_mmu_zap_all_fast() into "front" and "back" halves Sean Christopherson
2026-07-01 22:07   ` Michael Roth
2026-06-30 22:26 ` [PATCH v3 09/12] KVM: x86/mmu: Use split "zap all fast" helpers when invalidating memslot Sean Christopherson
2026-07-01 22:19   ` Michael Roth
2026-06-30 22:26 ` [PATCH v3 10/12] KVM: SEV: Forcefully invalidate SNP VMSA if its backing gmem page is zapped Sean Christopherson
2026-07-01 21:56   ` Michael Roth
2026-06-30 22:26 ` [PATCH v3 11/12] KVM: x86: Guard .gmem_prepare() declarations with HAVE_KVM_GMEM_PREPARE=y Sean Christopherson
2026-07-01 22:42   ` Michael Roth
2026-06-30 22:26 ` [PATCH v3 12/12] KVM: SEV: Mark vCPU has having guest-provided VMSA even if its invalid Sean Christopherson
2026-07-01 22:47   ` Michael Roth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox