* [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests
@ 2026-06-11 12:35 Jörg Rödel
2026-06-11 12:35 ` [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs Jörg Rödel
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Jörg Rödel @ 2026-06-11 12:35 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm,
Joerg Roedel
From: Joerg Roedel <joerg.roedel@amd.com>
Hi,
Here is a set of patches to support the VMM to provide a VMSA directly
to KVM which will then be used by the BSP of the SEV-SNP VM. The
use-case is IGVM loading, where the IGVM file contains a VMSA image
which must be loaded into the initial memory image of the VM as-is to
guarantee the expected launch measurement.
The first patch re-works guest-VMSA handling and streamlines the state
handling to make it more clear and maintainable. That patch accounts
for the biggest part if the changes.
I have tested these changes together with the planes patches and
COCONUT-SVSM and can confirm that the launch measurment is correct
again with these changes.
The changes are based on previous work by Roy Hopkins[1].
Please review.
Thanks,
Joerg
[1] https://github.com/torvalds/linux/commit/e00e081276b2cd9f1400ec5b1a9cd97f8b5c4d58
Joerg Roedel (4):
kvm: svm: Streamline VMSA setting for VCPUs
kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage
kvm: svm: Support guest-provided VMSA for launching
kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/svm/sev.c | 419 +++++++++++++++++++++++---------
arch/x86/kvm/svm/svm.h | 32 ++-
arch/x86/kvm/x86.c | 1 +
include/uapi/linux/kvm.h | 1 +
5 files changed, 337 insertions(+), 117 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs
2026-06-11 12:35 [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests Jörg Rödel
@ 2026-06-11 12:35 ` Jörg Rödel
2026-06-11 12:56 ` sashiko-bot
2026-06-11 12:35 ` [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage Jörg Rödel
` (2 subsequent siblings)
3 siblings, 1 reply; 15+ messages in thread
From: Jörg Rödel @ 2026-06-11 12:35 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm,
Joerg Roedel
From: Joerg Roedel <joerg.roedel@amd.com>
Streamline the VMSA setting state of vcpus, where a VMSA can be either
KVM-allocated or guest-provided. This consolidates the various
tracking state around VMSAs.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
arch/x86/kvm/svm/sev.c | 301 ++++++++++++++++++++++++++++-------------
arch/x86/kvm/svm/svm.h | 31 ++++-
2 files changed, 237 insertions(+), 95 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6c6a6d663e29..9b1280222e20 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -147,6 +147,9 @@ static bool sev_snp_guest(struct kvm *kvm)
}
static int snp_decommission_context(struct kvm *kvm);
+static int kvm_rmp_make_shared(struct kvm *kvm, u64 pfn, enum pg_level level);
+static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va);
+static int snp_page_reclaim(struct kvm *kvm, u64 pfn);
struct enc_region {
struct list_head list;
@@ -156,6 +159,173 @@ struct enc_region {
unsigned long size;
};
+static void *sev_es_vmsa_ref(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ void *vmsa = NULL;
+
+ if (svm->sev_es.vmsa.vmsa_state == VMSA_SHARED) {
+ vmsa = page_address(svm->sev_es.vmsa.vmsa_page);
+ }
+
+ return vmsa;
+}
+
+static int sev_es_vcpu_alloc_vmsa(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct page *vmsa_page;
+
+ if (WARN_ON_ONCE(svm->sev_es.vmsa.vmsa_state != VMSA_NONE))
+ return -EINVAL;
+
+ /*
+ * SEV-ES guests require a separate (from the VMCB) VMSA page used to
+ * contain the encrypted register state of the guest.
+ */
+ vmsa_page = snp_safe_alloc_page();
+ if (!vmsa_page)
+ return -ENOMEM;
+
+ svm->sev_es.vmsa.vmsa_state = VMSA_SHARED;
+ svm->sev_es.vmsa.vmsa_page = vmsa_page;
+
+ return 0;
+}
+
+static int sev_es_vcpu_vmsa_make_private(struct kvm_vcpu *vcpu)
+{
+ struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm);
+ struct vcpu_svm *svm = to_svm(vcpu);
+ void *vmsa = sev_es_vmsa_ref(vcpu);
+
+ if (!vmsa)
+ return -EINVAL;
+
+ if (is_sev_snp_guest(vcpu)) {
+ u64 pfn = __pa(vmsa) >> PAGE_SHIFT;
+ int ret;
+
+ /* Transition the VMSA page to a firmware state. */
+ ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+ if (ret)
+ return ret;
+ }
+
+ svm->sev_es.vmsa.vmsa_state = VMSA_PRIVATE;
+
+ return 0;
+}
+
+static void sev_es_vcpu_free_vmsa(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ void *vmsa_ptr;
+
+ switch (svm->sev_es.vmsa.vmsa_state) {
+ case VMSA_NONE:
+ case VMSA_GUEST:
+ break;
+ case VMSA_PRIVATE:
+ vmsa_ptr = page_address(svm->sev_es.vmsa.vmsa_page);
+
+ if (is_sev_snp_guest(vcpu)) {
+ u64 pfn = __pa(vmsa_ptr) >> PAGE_SHIFT;
+
+ if (kvm_rmp_make_shared(vcpu->kvm, pfn, PG_LEVEL_4K)) {
+ pr_err("Failed to make VMSA page shared - leaking it to avoid re-use\n");
+ goto out;
+ }
+ }
+
+ if (vcpu->arch.guest_state_protected)
+ sev_flush_encrypted_page(vcpu, vmsa_ptr);
+
+ fallthrough;
+ case VMSA_SHARED:
+ __free_page(svm->sev_es.vmsa.vmsa_page);
+ break;
+ default:
+ BUG();
+ }
+out:
+
+ svm->sev_es.vmsa.vmsa_page = NULL;
+ svm->sev_es.vmsa.vmsa_state = VMSA_NONE;
+}
+
+static void sev_snp_vcpu_reclaim_vmsa(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ void *vmsa_ptr;
+ u64 pfn;
+
+ if (WARN_ON_ONCE(!is_sev_snp_guest(vcpu) ||
+ svm->sev_es.vmsa.vmsa_state != VMSA_PRIVATE))
+ return;
+
+ vmsa_ptr = page_address(svm->sev_es.vmsa.vmsa_page);
+ pfn = __pa(vmsa_ptr) >> PAGE_SHIFT;
+
+ if (!snp_page_reclaim(vcpu->kvm, pfn))
+ __free_page(svm->sev_es.vmsa.vmsa_page);
+
+ svm->sev_es.vmsa.vmsa_page = NULL;
+ svm->sev_es.vmsa.vmsa_state = VMSA_NONE;
+}
+
+static void sev_es_set_guest_vmsa(struct kvm_vcpu *vcpu, gpa_t vmsa_gpa)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ sev_es_vcpu_free_vmsa(vcpu);
+
+ svm->sev_es.vmsa.vmsa_state = VMSA_GUEST;
+ svm->sev_es.vmsa.vmsa_gpa = vmsa_gpa;
+}
+
+static u64 sev_es_vmsa_pa(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ enum vmsa_state vmsa_state = svm->sev_es.vmsa.vmsa_state;
+ u64 vmsa_pa = INVALID_PAGE;
+
+ if (vmsa_state == VMSA_GUEST) {
+ gpa_t vmsa_gpa = svm->sev_es.vmsa.vmsa_gpa;
+ struct kvm_memory_slot *slot;
+ struct page *page;
+ kvm_pfn_t pfn;
+ gfn_t gfn;
+
+ gfn = gpa_to_gfn(vmsa_gpa);
+
+ slot = gfn_to_memslot(vcpu->kvm, gfn);
+ if (!slot)
+ goto out;
+
+ /*
+ * The new VMSA will be private memory guest memory, so retrieve the
+ * PFN from the gmem backend.
+ */
+ if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
+ goto out;
+
+ vmsa_pa = pfn_to_hpa(pfn);
+
+ /*
+ * gmem pages aren't currently migratable, but if this ever changes
+ * then care should be taken to ensure the guest vmsa is pinned
+ * through some other means.
+ */
+ kvm_release_page_clean(page);
+ } else if (vmsa_state == VMSA_PRIVATE || vmsa_state == VMSA_SHARED) {
+ vmsa_pa = __pa(page_address(svm->sev_es.vmsa.vmsa_page));
+ }
+
+out:
+ return vmsa_pa;
+}
+
/* Called with the sev_bitmap_lock held, or on shutdown */
static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid)
{
@@ -925,7 +1095,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm);
- struct sev_es_save_area *save = svm->sev_es.vmsa;
+ struct sev_es_save_area *save = sev_es_vmsa_ref(vcpu);
struct xregs_state *xsave;
const u8 *s;
u8 *d;
@@ -1026,6 +1196,7 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
{
struct sev_data_launch_update_vmsa vmsa;
struct vcpu_svm *svm = to_svm(vcpu);
+ void *vmsa_ref = sev_es_vmsa_ref(vcpu);
int ret;
if (vcpu->guest_debug) {
@@ -1043,15 +1214,19 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
* the VMSA memory content (i.e it will write the same memory region
* with the guest's key), so invalidate it first.
*/
- clflush_cache_range(svm->sev_es.vmsa, PAGE_SIZE);
+ clflush_cache_range(vmsa_ref, PAGE_SIZE);
vmsa.reserved = 0;
vmsa.handle = to_kvm_sev_info(kvm)->handle;
- vmsa.address = __sme_pa(svm->sev_es.vmsa);
+ vmsa.address = __sme_pa(vmsa_ref);
vmsa.len = PAGE_SIZE;
ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_VMSA, &vmsa, error);
if (ret)
- return ret;
+ goto free_vmsa;
+
+ ret = sev_es_vcpu_vmsa_make_private(vcpu);
+ if (ret)
+ goto free_vmsa;
/*
* SEV-ES guests maintain an encrypted version of their FPU
@@ -1069,7 +1244,13 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
* MSR_IA32_DEBUGCTLMSR when guest_state_protected is not set.
*/
svm_enable_lbrv(vcpu);
+
return 0;
+
+free_vmsa:
+ sev_es_vcpu_free_vmsa(vcpu);
+
+ return ret;
}
static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
@@ -2508,23 +2689,22 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
kvm_for_each_vcpu(i, vcpu, kvm) {
struct vcpu_svm *svm = to_svm(vcpu);
- u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+ void *vmsa = sev_es_vmsa_ref(vcpu);
ret = sev_es_sync_vmsa(svm);
if (ret)
goto out;
- /* Transition the VMSA page to a firmware state. */
- ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+ ret = sev_es_vcpu_vmsa_make_private(vcpu);
if (ret)
goto out;
/* Issue the SNP command to encrypt the VMSA */
- data.address = __sme_pa(svm->sev_es.vmsa);
+ data.address = __sme_pa(vmsa);
ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
&data, &argp->error);
if (ret) {
- snp_page_reclaim(kvm, pfn);
+ sev_snp_vcpu_reclaim_vmsa(vcpu);
goto out;
}
@@ -3593,31 +3773,13 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
void sev_free_vcpu(struct kvm_vcpu *vcpu)
{
- struct vcpu_svm *svm;
+ struct vcpu_svm *svm = to_svm(vcpu);
if (!is_sev_es_guest(vcpu))
return;
- svm = to_svm(vcpu);
-
- /*
- * If it's an SNP guest, then the VMSA was marked in the RMP table as
- * a guest-owned page. Transition the page to hypervisor state before
- * releasing it back to the system.
- */
- if (is_sev_snp_guest(vcpu)) {
- u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
-
- if (kvm_rmp_make_shared(vcpu->kvm, pfn, PG_LEVEL_4K))
- goto skip_vmsa_free;
- }
-
- if (vcpu->arch.guest_state_protected)
- sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
-
- __free_page(virt_to_page(svm->sev_es.vmsa));
+ sev_es_vcpu_free_vmsa(vcpu);
-skip_vmsa_free:
__sev_es_unmap_ghcb(svm);
}
@@ -4067,10 +4229,7 @@ static int snp_begin_psc(struct vcpu_svm *svm)
static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- struct kvm_memory_slot *slot;
- struct page *page;
- kvm_pfn_t pfn;
- gfn_t gfn;
+ u64 vmsa_pa;
guard(mutex)(&svm->sev_es.snp_vmsa_mutex);
@@ -4092,46 +4251,17 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
*/
vmcb_mark_all_dirty(svm->vmcb);
- if (!VALID_PAGE(svm->sev_es.snp_vmsa_gpa))
- return;
-
- gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
- svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ sev_es_set_guest_vmsa(vcpu, svm->sev_es.req_vmsa_gpa);
+ vmsa_pa = sev_es_vmsa_pa(vcpu);
- slot = gfn_to_memslot(vcpu->kvm, gfn);
- if (!slot)
+ if (!VALID_PAGE(vmsa_pa))
return;
- /*
- * The new VMSA will be private memory guest memory, so retrieve the
- * PFN from the gmem backend.
- */
- if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
- return;
-
- /*
- * From this point forward, the VMSA will always be a guest-mapped page
- * rather than the initial one allocated by KVM in svm->sev_es.vmsa. In
- * theory, svm->sev_es.vmsa could be free'd and cleaned up here, but
- * that involves cleanups like flushing caches, which would ideally be
- * handled during teardown rather than guest boot. Deferring that also
- * allows the existing logic for SEV-ES VMSAs to be re-used with
- * minimal SNP-specific changes.
- */
- svm->sev_es.snp_has_guest_vmsa = true;
-
/* Use the new VMSA */
- svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
+ svm->vmcb->control.vmsa_pa = vmsa_pa;
/* Mark the vCPU as runnable */
kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
-
- /*
- * gmem pages aren't currently migratable, but if this ever changes
- * then care should be taken to ensure svm->sev_es.vmsa is pinned
- * through some other means.
- */
- kvm_release_page_clean(page);
}
static int sev_snp_ap_creation(struct vcpu_svm *svm)
@@ -4187,10 +4317,10 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
return -EINVAL;
}
- target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+ target_svm->sev_es.req_vmsa_gpa = svm->vmcb->control.exit_info_2;
break;
case SVM_VMGEXIT_AP_DESTROY:
- target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ target_svm->sev_es.req_vmsa_gpa = INVALID_PAGE;
break;
default:
vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
@@ -4708,20 +4838,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm, bool init_event)
struct vmcb *vmcb = svm->vmcb01.ptr;
svm->vmcb->control.misc_ctl |= SVM_MISC_ENABLE_SEV_ES;
-
- /*
- * An SEV-ES guest requires a VMSA area that is a separate from the
- * VMCB page. Do not include the encryption mask on the VMSA physical
- * address since hardware will access it using the guest key. Note,
- * the VMSA will be NULL if this vCPU is the destination for intrahost
- * migration, and will be copied later.
- */
- if (!svm->sev_es.snp_has_guest_vmsa) {
- if (svm->sev_es.vmsa)
- svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
- else
- svm->vmcb->control.vmsa_pa = INVALID_PAGE;
- }
+ svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(&svm->vcpu);
if (cpu_feature_enabled(X86_FEATURE_ALLOWED_SEV_FEATURES))
svm->vmcb->control.allowed_sev_features = sev->vmsa_features |
@@ -4797,7 +4914,7 @@ void sev_init_vmcb(struct vcpu_svm *svm, bool init_event)
int sev_vcpu_create(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- struct page *vmsa_page;
+ int ret;
mutex_init(&svm->sev_es.snp_vmsa_mutex);
@@ -4808,11 +4925,9 @@ int sev_vcpu_create(struct kvm_vcpu *vcpu)
* SEV-ES guests require a separate (from the VMCB) VMSA page used to
* contain the encrypted register state of the guest.
*/
- vmsa_page = snp_safe_alloc_page();
- if (!vmsa_page)
- return -ENOMEM;
-
- svm->sev_es.vmsa = page_address(vmsa_page);
+ ret = sev_es_vcpu_alloc_vmsa(vcpu);
+ if (ret)
+ return ret;
vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm);
@@ -5227,12 +5342,14 @@ struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu)
if (!is_sev_es_guest(vcpu))
return NULL;
+ vmsa = sev_es_vmsa_ref(vcpu);
+
/*
* If the VMSA has not yet been encrypted, return a pointer to the
* current un-encrypted VMSA.
*/
- if (!vcpu->arch.guest_state_protected)
- return (struct vmcb_save_area *)svm->sev_es.vmsa;
+ if (vmsa)
+ return vmsa;
sev = to_kvm_sev_info(vcpu->kvm);
@@ -5303,8 +5420,10 @@ struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu)
void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa)
{
+ struct vmcb_save_area *vmsa_ptr = sev_es_vmsa_ref(vcpu);
+
/* If the VMSA has not yet been encrypted, nothing was allocated */
- if (!vcpu->arch.guest_state_protected || !vmsa)
+ if (vmsa == vmsa_ptr)
return;
free_page((unsigned long)vmsa);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5137416be593..3d4799f09b23 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -240,9 +240,29 @@ struct svm_nested_state {
bool force_msr_bitmap_recalc;
};
+enum vmsa_state {
+ /* No VMSA set */
+ VMSA_NONE,
+ /* VMSA allocated by KVM - Shared in RMP (if applicable) */
+ VMSA_SHARED,
+ /* VMSA allocated by KVM - Guest-private in RMP (SEV-SNP only) */
+ VMSA_PRIVATE,
+ /* Guest-owned VMSA */
+ VMSA_GUEST,
+};
+
+struct sev_es_vmsa_state {
+ enum vmsa_state vmsa_state;
+ union {
+ /* state == (KVM_SHARED || KVM_PRIVATE) */
+ struct page *vmsa_page;
+ /* state == GUEST */
+ gpa_t vmsa_gpa;
+ };
+};
+
struct vcpu_sev_es_state {
/* SEV-ES support */
- struct sev_es_save_area *vmsa;
struct ghcb *ghcb;
u8 valid_bitmap[16];
struct kvm_host_map ghcb_map;
@@ -266,10 +286,13 @@ struct vcpu_sev_es_state {
u64 ghcb_registered_gpa;
- struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
- gpa_t snp_vmsa_gpa;
+ /* VMSA related state */
+ struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
+ struct sev_es_vmsa_state vmsa; /* VMSA currently used by the VCPU */
+ gpa_t req_vmsa_gpa; /* Requested new VMSA GPA */
+
+ bool snp_ap_runnable;
bool snp_ap_waiting_for_reset;
- bool snp_has_guest_vmsa;
};
struct vcpu_svm {
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage
2026-06-11 12:35 [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests Jörg Rödel
2026-06-11 12:35 ` [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs Jörg Rödel
@ 2026-06-11 12:35 ` Jörg Rödel
2026-06-11 12:58 ` sashiko-bot
2026-06-11 12:35 ` [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching Jörg Rödel
2026-06-11 12:35 ` [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE Jörg Rödel
3 siblings, 1 reply; 15+ messages in thread
From: Jörg Rödel @ 2026-06-11 12:35 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm,
Joerg Roedel
From: Joerg Roedel <joerg.roedel@amd.com>
Do not allocate a KVM-managed VMSA for all VCPUs on VCPU creation,
defer it to the LAUNCH_FINISH stage of SEV-ES and SEV-SNP. At this
stage the VMSAs get used for the first time.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
arch/x86/kvm/svm/sev.c | 40 +++++++++++++++++++++++-----------------
1 file changed, 23 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9b1280222e20..350bb97c32c0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1095,11 +1095,11 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm);
- struct sev_es_save_area *save = sev_es_vmsa_ref(vcpu);
+ struct sev_es_save_area *save;
struct xregs_state *xsave;
const u8 *s;
+ int ret, i;
u8 *d;
- int i;
lockdep_assert_held(&vcpu->mutex);
@@ -1110,6 +1110,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1))
return -EINVAL;
+ ret = sev_es_vcpu_alloc_vmsa(vcpu);
+ if (ret)
+ return ret;
+
+ save = sev_es_vmsa_ref(vcpu);
+
/*
* SEV-ES will use a VMSA that is pointed to by the VMCB, not
* the traditional VMSA that is part of the VMCB. Copy the
@@ -1196,7 +1202,7 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
{
struct sev_data_launch_update_vmsa vmsa;
struct vcpu_svm *svm = to_svm(vcpu);
- void *vmsa_ref = sev_es_vmsa_ref(vcpu);
+ void *vmsa_ref;
int ret;
if (vcpu->guest_debug) {
@@ -1209,6 +1215,8 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
if (ret)
return ret;
+ vmsa_ref = sev_es_vmsa_ref(vcpu);
+
/*
* The LAUNCH_UPDATE_VMSA command will perform in-place encryption of
* the VMSA memory content (i.e it will write the same memory region
@@ -1237,6 +1245,9 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
fpstate_set_confidential(&vcpu->arch.guest_fpu);
vcpu->arch.guest_state_protected = true;
+ /* VMSA encrypted - put it into the VMCB */
+ svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(vcpu);
+
/*
* SEV-ES guest mandates LBR Virtualization to be _always_ ON. Enable it
* only after setting guest_state_protected because KVM_SET_MSRS allows
@@ -2689,12 +2700,14 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
kvm_for_each_vcpu(i, vcpu, kvm) {
struct vcpu_svm *svm = to_svm(vcpu);
- void *vmsa = sev_es_vmsa_ref(vcpu);
+ void *vmsa;
ret = sev_es_sync_vmsa(svm);
if (ret)
goto out;
+ vmsa = sev_es_vmsa_ref(vcpu);
+
ret = sev_es_vcpu_vmsa_make_private(vcpu);
if (ret)
goto out;
@@ -2710,6 +2723,10 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
}
svm->vcpu.arch.guest_state_protected = true;
+
+ /* VMSA encrypted - put it into the VMCB */
+ svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(vcpu);
+
/*
* SEV-ES (and thus SNP) guest mandates LBR Virtualization to
* be _always_ ON. Enable it only after setting
@@ -4914,22 +4931,11 @@ void sev_init_vmcb(struct vcpu_svm *svm, bool init_event)
int sev_vcpu_create(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- int ret;
mutex_init(&svm->sev_es.snp_vmsa_mutex);
- if (!is_sev_es_guest(vcpu))
- return 0;
-
- /*
- * SEV-ES guests require a separate (from the VMCB) VMSA page used to
- * contain the encrypted register state of the guest.
- */
- ret = sev_es_vcpu_alloc_vmsa(vcpu);
- if (ret)
- return ret;
-
- vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm);
+ if (is_sev_es_guest(vcpu))
+ vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm);
return 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching
2026-06-11 12:35 [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests Jörg Rödel
2026-06-11 12:35 ` [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs Jörg Rödel
2026-06-11 12:35 ` [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage Jörg Rödel
@ 2026-06-11 12:35 ` Jörg Rödel
2026-06-11 13:05 ` sashiko-bot
2026-06-11 12:35 ` [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE Jörg Rödel
3 siblings, 1 reply; 15+ messages in thread
From: Jörg Rödel @ 2026-06-11 12:35 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm,
Joerg Roedel
From: Joerg Roedel <joerg.roedel@amd.com>
Introduce a way to provide a guest GPA as the initial BSP VMSA and
avoid allocating KVM-managed VMSAs in this case. Only one
guest-provided VMSA is supported at the moment as IGVM also only
supports to set a single VMSA.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
arch/x86/kvm/svm/sev.c | 62 ++++++++++++++++++++++++++++++------------
arch/x86/kvm/svm/svm.h | 1 +
2 files changed, 45 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 350bb97c32c0..88db83b3ff8e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -726,6 +726,7 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
INIT_LIST_HEAD(&sev->regions_list);
INIT_LIST_HEAD(&sev->mirror_vms);
+ sev->initial_vmsa_gpa = INVALID_PAGE;
sev->need_init = false;
kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV);
@@ -2680,6 +2681,46 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
return 0;
}
+static int snp_init_guest_vmsa(struct kvm_vcpu *vcpu, gpa_t vmsa_gpa)
+{
+ /* Only one initial guest VMSA can exist (per IGVM) - so it belongs to the BSP */
+ if (vcpu->vcpu_idx != 0)
+ return 0;
+
+ /* VMSA already private and encrypted via LAUNCH_UPDATE */
+ sev_es_set_guest_vmsa(vcpu, vmsa_gpa);
+
+ return 0;
+}
+
+static int snp_init_kvm_vmsa(struct kvm_vcpu *vcpu,
+ struct sev_data_snp_launch_update *data,
+ struct kvm_sev_cmd *argp)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ int ret;
+ void *vmsa;
+
+ ret = sev_es_sync_vmsa(svm);
+ if (ret)
+ return ret;
+
+ vmsa = sev_es_vmsa_ref(vcpu);
+
+ ret = sev_es_vcpu_vmsa_make_private(vcpu);
+ if (ret)
+ return ret;
+
+ /* Issue the SNP command to encrypt the VMSA */
+ data->address = __sme_pa(vmsa);
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ data, &argp->error);
+ if (ret)
+ sev_snp_vcpu_reclaim_vmsa(vcpu);
+
+ return ret;
+}
+
static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
@@ -2700,28 +2741,13 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
kvm_for_each_vcpu(i, vcpu, kvm) {
struct vcpu_svm *svm = to_svm(vcpu);
- void *vmsa;
- ret = sev_es_sync_vmsa(svm);
+ ret = VALID_PAGE(sev->initial_vmsa_gpa) ?
+ snp_init_guest_vmsa(vcpu, sev->initial_vmsa_gpa) :
+ snp_init_kvm_vmsa(vcpu, &data, argp);
if (ret)
goto out;
- vmsa = sev_es_vmsa_ref(vcpu);
-
- ret = sev_es_vcpu_vmsa_make_private(vcpu);
- if (ret)
- goto out;
-
- /* Issue the SNP command to encrypt the VMSA */
- data.address = __sme_pa(vmsa);
- ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
- &data, &argp->error);
- if (ret) {
- sev_snp_vcpu_reclaim_vmsa(vcpu);
-
- goto out;
- }
-
svm->vcpu.arch.guest_state_protected = true;
/* VMSA encrypted - put it into the VMCB */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 3d4799f09b23..cc7e84c230bb 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -117,6 +117,7 @@ struct kvm_sev_info {
struct mutex guest_req_mutex; /* Must acquire before using bounce buffers */
cpumask_var_t have_run_cpus; /* CPUs that have done VMRUN for this VM. */
bool snp_certs_enabled; /* SNP certificate-fetching support. */
+ gpa_t initial_vmsa_gpa; /* Optinal GPA of BSP VMSA - SEV-SNP only */
};
#endif
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE
2026-06-11 12:35 [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests Jörg Rödel
` (2 preceding siblings ...)
2026-06-11 12:35 ` [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching Jörg Rödel
@ 2026-06-11 12:35 ` Jörg Rödel
2026-06-11 12:43 ` Sean Christopherson
2026-06-11 12:58 ` sashiko-bot
3 siblings, 2 replies; 15+ messages in thread
From: Jörg Rödel @ 2026-06-11 12:35 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm,
Joerg Roedel
From: Joerg Roedel <joerg.roedel@amd.com>
Support setting a VMSA in guest physical memory during the SEV-SNP
launch process. Only one VMSA can be provided which will then be used
for the BSP. All of the APs will not have a VMSA allocated or assigned
when this feature is used.
This ensures stable launch measurements on SEV-SNP which are
independent of the number of VCPUs the VM is launched with.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/svm/sev.c | 44 ++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 1 +
include/uapi/linux/kvm.h | 1 +
4 files changed, 46 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 5f2b30d0405c..fc87a5ba295b 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -885,6 +885,7 @@ struct kvm_sev_snp_launch_start {
/* Kept in sync with firmware values for simplicity. */
#define KVM_SEV_PAGE_TYPE_INVALID 0x0
#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2
#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 88db83b3ff8e..90399d5d0358 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2520,6 +2520,20 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return rc;
}
+static bool snp_check_launch_vmsa(struct kvm_sev_info *sev,
+ struct sev_es_save_area *vmsa)
+{
+ /* VMSA sev_features must match VMs vmsa_features */
+ if (vmsa->sev_features != sev->vmsa_features)
+ return false;
+
+ /* Must always boot from VMPL0 */
+ if (vmsa->vmpl != 0)
+ return false;
+
+ return true;
+}
+
struct sev_gmem_populate_args {
__u8 type;
int sev_fd;
@@ -2532,7 +2546,9 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
struct sev_gmem_populate_args *sev_populate_args = opaque;
struct sev_data_snp_launch_update fw_args = {0};
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
+ gpa_t gpa = gfn << PAGE_SHIFT;
bool assigned = false;
+ u64 sev_features = 0;
int level;
int ret;
@@ -2550,14 +2566,27 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
if (src_page) {
void *src_vaddr = kmap_local_page(src_page);
void *dst_vaddr = kmap_local_pfn(pfn);
+ struct sev_es_save_area *vmsa = dst_vaddr;
+ bool accept_page = true;
memcpy(dst_vaddr, src_vaddr, PAGE_SIZE);
+ if (sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_VMSA) {
+ accept_page = snp_check_launch_vmsa(sev, vmsa);
+ if (accept_page)
+ sev_features = vmsa->sev_features;
+ }
+
kunmap_local(src_vaddr);
kunmap_local(dst_vaddr);
+
+ if (!accept_page) {
+ ret = -EINVAL;
+ goto out;
+ }
}
- ret = rmp_make_private(pfn, gfn << PAGE_SHIFT, PG_LEVEL_4K,
+ ret = rmp_make_private(pfn, gpa, PG_LEVEL_4K,
sev_get_asid(kvm), true);
if (ret)
goto out;
@@ -2593,6 +2622,9 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
kunmap_local(dst_vaddr);
}
+ if (ret == 0 && sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_VMSA)
+ sev->initial_vmsa_gpa = gpa;
+
out:
if (ret)
pr_debug("%s: error updating GFN %llx, return code %d (fw_error %d)\n",
@@ -2620,12 +2652,22 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
if (!params.len || !PAGE_ALIGNED(params.len) || params.flags ||
(params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL &&
+ params.type != KVM_SEV_SNP_PAGE_TYPE_VMSA &&
params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO &&
params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED &&
params.type != KVM_SEV_SNP_PAGE_TYPE_SECRETS &&
params.type != KVM_SEV_SNP_PAGE_TYPE_CPUID))
return -EINVAL;
+ if (params.type == KVM_SEV_SNP_PAGE_TYPE_VMSA) {
+ /* VMSA page are allowed only once */
+ if (sev->initial_vmsa_gpa != INVALID_PAGE)
+ return -EBUSY;
+ /* Can only deploy a single page as VMSA */
+ if (params.len != PAGE_SIZE)
+ return -EINVAL;
+ }
+
src = params.type == KVM_SEV_SNP_PAGE_TYPE_ZERO ? NULL : u64_to_user_ptr(params.uaddr);
if (!PAGE_ALIGNED(src))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0550359ed798..dc9abe62476e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4870,6 +4870,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_MEMORY_FAULT_INFO:
case KVM_CAP_X86_GUEST_MODE:
case KVM_CAP_ONE_REG:
+ case KVM_CAP_SNP_DIRECT_VMSA:
r = 1;
break;
case KVM_CAP_PRE_FAULT_MEMORY:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6c8afa2047bf..bf034435f98c 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -996,6 +996,7 @@ struct kvm_enable_cap {
#define KVM_CAP_S390_USER_OPEREXEC 246
#define KVM_CAP_S390_KEYOP 247
#define KVM_CAP_S390_VSIE_ESAMODE 248
+#define KVM_CAP_SNP_DIRECT_VMSA 249
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE
2026-06-11 12:35 ` [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE Jörg Rödel
@ 2026-06-11 12:43 ` Sean Christopherson
2026-06-11 13:23 ` Jörg Rödel
2026-06-11 12:58 ` sashiko-bot
1 sibling, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2026-06-11 12:43 UTC (permalink / raw)
To: Jörg Rödel
Cc: Paolo Bonzini, x86, Tom Lendacky, Michael Roth, kvm, linux-kernel,
coconut-svsm, Joerg Roedel
On Thu, Jun 11, 2026, Jörg Rödel wrote:
> From: Joerg Roedel <joerg.roedel@amd.com>
>
> Support setting a VMSA in guest physical memory during the SEV-SNP
> launch process. Only one VMSA can be provided which will then be used
> for the BSP. All of the APs will not have a VMSA allocated or assigned
> when this feature is used.
>
> This ensures stable launch measurements on SEV-SNP which are
> independent of the number of VCPUs the VM is launched with.
This needs a *much* longer explanation and more justification for exactly why
this needs to be handled in KVM. I understand most of the words and acronyms,
but that's about where my understanding stops.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs
2026-06-11 12:35 ` [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs Jörg Rödel
@ 2026-06-11 12:56 ` sashiko-bot
2026-06-11 14:13 ` Jörg Rödel
0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-11 12:56 UTC (permalink / raw)
To: Jörg Rödel; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] Dynamically freeing an active initial VMSA page during SNP AP creation without using SNP_PAGE_RECLAIM triggers an RMP violation, which bug-halts the VM and leaks memory.
Pre-existing issues:
- [Critical] A malicious SEV-SNP guest can crash the host kernel by punching a hole in a guest_memfd page that is actively assigned as an AP VMSA.
--
commit dfce748c773ed97c4a6e2b3a1ea5718862d5c1bf
Author: Joerg Roedel <joerg.roedel@amd.com>
kvm: svm: Streamline VMSA setting for VCPUs
This commit streamlines the VMSA setting state of vcpus, allowing a VMSA
to be either KVM-allocated or guest-provided. It consolidates the various
tracking states around VMSAs.
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0c3962647116e..9e00aa7bdca8c 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
[ ... ]
> @@ -156,6 +159,173 @@ struct enc_region {
[ ... ]
> +static void sev_es_vcpu_free_vmsa(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> + void *vmsa_ptr;
> +
> + switch (svm->sev_es.vmsa.vmsa_state) {
> + case VMSA_NONE:
> + case VMSA_GUEST:
> + break;
> + case VMSA_PRIVATE:
> + vmsa_ptr = page_address(svm->sev_es.vmsa.vmsa_page);
> +
> + if (is_sev_snp_guest(vcpu)) {
> + u64 pfn = __pa(vmsa_ptr) >> PAGE_SHIFT;
> +
> + if (kvm_rmp_make_shared(vcpu->kvm, pfn, PG_LEVEL_4K)) {
[Severity: High]
Does this sequence trigger an RMP violation?
When an SEV-SNP guest provisions an AP and triggers svm_vmgexit_ap_creation(),
KVM replaces the initial KVM-managed VMSA with the guest-provided one using
sev_es_set_guest_vmsa(), which eagerly frees the old VMSA via
sev_es_vcpu_free_vmsa().
For SNP guests, the old VMSA page is in the VMSA_PRIVATE state and has been
transitioned to the hardware VMSA state. To transition it back to shared
memory, AMD architecture requires reclaiming the page via snp_page_reclaim()
before issuing an RMPUPDATE.
Since sev_es_vcpu_free_vmsa() bypasses snp_page_reclaim() and directly calls
kvm_rmp_make_shared(), could this fail with an RMP violation, causing the VM
to bug-halt and leak the physical page?
> + pr_err("Failed to make VMSA page shared - leaking it to avoid re-use\n");
> + goto out;
> + }
> + }
> +
> + if (vcpu->arch.guest_state_protected)
> + sev_flush_encrypted_page(vcpu, vmsa_ptr);
> +
> + fallthrough;
[ ... ]
> +static u64 sev_es_vmsa_pa(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> + enum vmsa_state vmsa_state = svm->sev_es.vmsa.vmsa_state;
> + u64 vmsa_pa = INVALID_PAGE;
> +
> + if (vmsa_state == VMSA_GUEST) {
[ ... ]
> + /*
> + * The new VMSA will be private memory guest memory, so retrieve the
> + * PFN from the gmem backend.
> + */
> + if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
> + goto out;
> +
> + vmsa_pa = pfn_to_hpa(pfn);
> +
> + /*
> + * gmem pages aren't currently migratable, but if this ever changes
> + * then care should be taken to ensure the guest vmsa is pinned
> + * through some other means.
> + */
> + kvm_release_page_clean(page);
[Severity: Critical]
This is a pre-existing issue, but does dropping the page reference here allow
a malicious guest to crash the host kernel?
When a guest-provided VMSA is used for an AP, KVM resolves the GPA to a PFN
and immediately drops its page reference here. The physical address is then
programmed into the hardware VMCB.
If a malicious guest issues a Page State Change request to transition this VMSA
page back to shared memory, host userspace can handle this by calling
fallocate(PUNCH_HOLE) on the guest_memfd. This would trigger
sev_gmem_invalidate(), which attempts rmp_make_shared().
Because the page is still in the VMSA state, rmp_make_shared() would fail.
sev_gmem_invalidate() only prints a warning and proceeds, freeing the page
back to the buddy allocator while its RMP entry is still guest-restricted.
When the host buddy allocator reallocates and accesses this page, would it
trigger an RMP violation and a host kernel panic?
> + } else if (vmsa_state == VMSA_PRIVATE || vmsa_state == VMSA_SHARED) {
> + vmsa_pa = __pa(page_address(svm->sev_es.vmsa.vmsa_page));
> + }
> +
> +out:
> + return vmsa_pa;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260611123528.572255-1-joro@8bytes.org?part=1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage
2026-06-11 12:35 ` [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage Jörg Rödel
@ 2026-06-11 12:58 ` sashiko-bot
2026-06-11 14:29 ` Jörg Rödel
0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-11 12:58 UTC (permalink / raw)
To: Jörg Rödel; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Missing VMSA cleanup on the error path in `snp_init_kvm_vmsa()` causes a guest-reachable `WARN_ON_ONCE` upon retry.
- [Medium] Late vCPU creation results in KVM's INVALID_PAGE sentinel being passed directly to the hardware PSP firmware when dumping VMCB.
--
commit 31275eaac0590092737efa334c2a33922a5b5fd9
Author: Joerg Roedel <joerg.roedel@amd.com>
kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage
Do not allocate a KVM-managed VMSA for all VCPUs on VCPU creation,
defer it to the LAUNCH_FINISH stage of SEV-ES and SEV-SNP. At this
stage the VMSAs get used for the first time.
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 9e00aa7bdca8c..0aa62cb802fe5 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
[ ... ]
> @@ -2689,12 +2700,14 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> struct vcpu_svm *svm = to_svm(vcpu);
> - void *vmsa = sev_es_vmsa_ref(vcpu);
> + void *vmsa;
>
> ret = sev_es_sync_vmsa(svm);
> if (ret)
> goto out;
>
> + vmsa = sev_es_vmsa_ref(vcpu);
> +
> ret = sev_es_vcpu_vmsa_make_private(vcpu);
> if (ret)
> goto out;
[Severity: Medium]
Does this error path leak the newly allocated VMSA?
Since sev_es_sync_vmsa() now allocates the VMSA, if
sev_es_vcpu_vmsa_make_private() fails, the code returns via the "out" label
without freeing the VMSA or resetting its state.
If userspace retries the KVM_SEV_SNP_LAUNCH_FINISH ioctl, sev_es_sync_vmsa()
is called again and will trigger the WARN_ON_ONCE() in
sev_es_vcpu_alloc_vmsa() because the state is still VMSA_SHARED:
if (WARN_ON_ONCE(svm->sev_es.vmsa.vmsa_state != VMSA_NONE))
return -EINVAL;
Can we add a call to sev_es_vcpu_free_vmsa() on this error path?
[ ... ]
> @@ -4889,22 +4906,11 @@ void sev_init_vmcb(struct vcpu_svm *svm, bool init_event)
> int sev_vcpu_create(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> - int ret;
>
> mutex_init(&svm->sev_es.snp_vmsa_mutex);
>
> - if (!is_sev_es_guest(vcpu))
> - return 0;
> -
> - /*
> - * SEV-ES guests require a separate (from the VMCB) VMSA page used to
> - * contain the encrypted register state of the guest.
> - */
> - ret = sev_es_vcpu_alloc_vmsa(vcpu);
> - if (ret)
> - return ret;
> -
> - vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm);
> + if (is_sev_es_guest(vcpu))
> + vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm);
>
> return 0;
> }
[Severity: Medium]
Does removing this allocation lead to KVM's INVALID_PAGE sentinel being passed
to the PSP firmware?
If userspace creates a vCPU after the SEV-ES or SEV-SNP launch sequence has
completed, the new vCPU's VMSA is never allocated.
When this vCPU is run, sev_es_vmsa_pa() returns KVM's INVALID_PAGE sentinel
(~0ull), which is loaded into the VMCB, triggering an immediate #VMEXIT
(SVM_EXIT_ERR).
If the dump_invalid_vmcb parameter is enabled, KVM handles this by calling
dump_vmcb(), which then calls sev_decrypt_vmsa(). Since sev_es_vmsa_ref()
returns NULL for VMSA_NONE, sev_decrypt_vmsa() erroneously assumes the VMSA
is encrypted and passes the INVALID_PAGE sentinel directly to the PSP
firmware as the source physical address for SEV_CMD_DBG_DECRYPT:
data.src_addr = svm->vmcb->control.vmsa_pa;
data.len = PAGE_SIZE;
ret = sev_issue_cmd(vcpu->kvm, SEV_CMD_DBG_DECRYPT, &data, &argp_err);
Could this direct hardware exposure to an invalid page address risk firmware
hangs or IOMMU faults?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260611123528.572255-1-joro@8bytes.org?part=2
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE
2026-06-11 12:35 ` [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE Jörg Rödel
2026-06-11 12:43 ` Sean Christopherson
@ 2026-06-11 12:58 ` sashiko-bot
2026-06-11 15:23 ` Jörg Rödel
1 sibling, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-11 12:58 UTC (permalink / raw)
To: Jörg Rödel; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
New issues:
- [High] Missing `srcu_read_lock()` when calling `gfn_to_memslot()` in `sev_es_vmsa_pa()`, leading to a potential Use-After-Free of KVM memslots.
- [Low] The local variable `sev_features` in `sev_gmem_post_populate()` is assigned but never used.
Pre-existing issues:
- [Critical] KVM writes firmware data to a potentially read-only userspace page during CPUID error recovery, which could corrupt the global zero page.
- [High] Out-of-order `kunmap_local()` calls violate the highmem API's strict LIFO unmapping requirement.
- [High] A malicious host userspace process can trigger a kernel warning (`WARN_ON_ONCE`) by passing a NULL user address (`uaddr = 0`) when issuing `KVM_SEV_SNP_LAUNCH_UPDATE` for any non-ZERO page type.
--
commit f2475c503a445642b6525297ce3c8dd1d2a1067f
Author: Joerg Roedel <joerg.roedel@amd.com>
kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE
This commit adds support for setting a VMSA in guest physical memory
during the SEV-SNP launch process. It introduces a new page type,
KVM_SEV_SNP_PAGE_TYPE_VMSA, which is used for the BSP.
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0c42af2e62016..52d0fa5ebdab1 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
[ ... ]
> @@ -2532,7 +2546,9 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> struct sev_gmem_populate_args *sev_populate_args = opaque;
> struct sev_data_snp_launch_update fw_args = {0};
> struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
> + gpa_t gpa = gfn << PAGE_SHIFT;
> bool assigned = false;
> + u64 sev_features = 0;
> int level;
> int ret;
[Severity: Low]
Is the sev_features variable necessary? It gets assigned below when
processing a KVM_SEV_SNP_PAGE_TYPE_VMSA page but doesn't appear to be
used anywhere else in this function.
[ ... ]
> @@ -2550,14 +2566,27 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> if (src_page) {
> void *src_vaddr = kmap_local_page(src_page);
> void *dst_vaddr = kmap_local_pfn(pfn);
> + struct sev_es_save_area *vmsa = dst_vaddr;
> + bool accept_page = true;
>
> memcpy(dst_vaddr, src_vaddr, PAGE_SIZE);
>
> + if (sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_VMSA) {
> + accept_page = snp_check_launch_vmsa(sev, vmsa);
> + if (accept_page)
> + sev_features = vmsa->sev_features;
> + }
>
> kunmap_local(src_vaddr);
> kunmap_local(dst_vaddr);
[Severity: High]
This is a pre-existing issue, but does this violate the LIFO unmapping
requirement of the highmem API?
The kmap_local_page() documentation requires mappings to be unmapped in the
exact reverse order. Here src_vaddr is mapped first, then dst_vaddr, but
they are unmapped in the same order instead of reverse. Could this corrupt
the internal stack on the next kmap/kunmap operations?
[ ... ]
> @@ -2593,6 +2622,9 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> kunmap_local(dst_vaddr);
> }
>
> + if (ret == 0 && sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_VMSA)
> + sev->initial_vmsa_gpa = gpa;
> +
[Severity: High]
Will this new VMSA initialization path expose a use-after-free of KVM
memslots during launch finish?
By setting sev->initial_vmsa_gpa to a valid page here, the ioctl path
KVM_SEV_SNP_LAUNCH_FINISH can now call sev_es_vmsa_pa(), which looks up the
slot:
arch/x86/kvm/svm/sev.c:sev_es_vmsa_pa() {
...
slot = gfn_to_memslot(vcpu->kvm, gfn);
...
}
Since the launch finish ioctl takes kvm->lock and vcpu->mutex, but not
srcu_read_lock(), could a concurrent memslot update swap and free the
memslot before it is accessed in kvm_gmem_get_pfn()?
> out:
> if (ret)
> pr_debug("%s: error updating GFN %llx, return code %d (fw_error %d)\n",
> @@ -2620,12 +2652,22 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
> if (!params.len || !PAGE_ALIGNED(params.len) || params.flags ||
> (params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL &&
> + params.type != KVM_SEV_SNP_PAGE_TYPE_VMSA &&
> params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO &&
> params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED &&
> params.type != KVM_SEV_SNP_PAGE_TYPE_SECRETS &&
> params.type != KVM_SEV_SNP_PAGE_TYPE_CPUID))
> return -EINVAL;
[Severity: Critical]
This isn't a bug introduced by this patch, but does the error recovery path
for KVM_SEV_SNP_PAGE_TYPE_CPUID write to a potentially read-only page?
When handling the CPUID type, kvm_gmem_populate() calls get_user_pages_fast()
with gup_flags = 0, requesting read-only access. If userspace mapped a
read-only page like the global zero page, GUP will succeed.
If the firmware returns SEV_RET_INVALID_PARAM, sev_gmem_post_populate() maps
the page with kmap_local_page() and copies the expected values into it:
memcpy(src_vaddr, dst_vaddr, PAGE_SIZE);
Could this illegally overwrite read-only memory and corrupt the global zero
page?
> + if (params.type == KVM_SEV_SNP_PAGE_TYPE_VMSA) {
> + /* VMSA page are allowed only once */
> + if (sev->initial_vmsa_gpa != INVALID_PAGE)
> + return -EBUSY;
> + /* Can only deploy a single page as VMSA */
> + if (params.len != PAGE_SIZE)
> + return -EINVAL;
> + }
>
> src = params.type == KVM_SEV_SNP_PAGE_TYPE_ZERO ? NULL : u64_to_user_ptr(params.uaddr);
[Severity: High]
This is a pre-existing issue, but can a host userspace process trigger a
kernel warning by passing a NULL user address (uaddr = 0) here?
If params.uaddr is 0, src becomes NULL and passes the PAGE_ALIGNED(src)
check. kvm_gmem_populate() skips fetching the user page and passes
src_page = NULL to sev_gmem_post_populate().
That function then unconditionally evaluates:
WARN_ON_ONCE(sev_populate_args->type != KVM_SEV_SNP_PAGE_TYPE_ZERO &&
!src_page)
Since the type isn't ZERO, won't this allow an unprivileged user to spam
the kernel log?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260611123528.572255-1-joro@8bytes.org?part=4
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching
2026-06-11 12:35 ` [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching Jörg Rödel
@ 2026-06-11 13:05 ` sashiko-bot
2026-06-11 14:43 ` Jörg Rödel
0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-11 13:05 UTC (permalink / raw)
To: Jörg Rödel; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [High] Resolving the guest-provided VMSA GPA via `gfn_to_memslot()` occurs without holding `kvm->srcu` or `kvm->slots_lock`, violating KVM's memslot locking hierarchy.
Pre-existing issues:
- [Critical] Guest-provided VMSA pages are freed to the host buddy allocator while still in the RMP Firmware state, allowing userspace to trigger a host kernel panic (Denial of Service).
- [High] KVM-allocated VMSA pages for successfully launched SNP VMs are permanently leaked during VM teardown.
--
commit 7ccfa1323e073ebe13797e68f8f429e2d5dfb5ec
Author: Joerg Roedel <joerg.roedel@amd.com>
kvm: svm: Support guest-provided VMSA for launching
Introduce a way to provide a guest GPA as the initial BSP VMSA and
avoid allocating KVM-managed VMSAs in this case. Only one
guest-provided VMSA is supported at the moment as IGVM also only
supports to set a single VMSA.
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0aa62cb802fe5..0c42af2e62016 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
[ ... ]
> @@ -2680,6 +2681,46 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return 0;
> }
>
> +static int snp_init_guest_vmsa(struct kvm_vcpu *vcpu, gpa_t vmsa_gpa)
> +{
> + /* Only one initial guest VMSA can exist (per IGVM) - so it belongs to the BSP */
> + if (vcpu->vcpu_idx != 0)
> + return 0;
> +
> + /* VMSA already private and encrypted via LAUNCH_UPDATE */
> + sev_es_set_guest_vmsa(vcpu, vmsa_gpa);
> +
> + return 0;
> +}
[Severity: Critical]
This is a pre-existing issue, but can unprivileged userspace panic the host
kernel by punching a hole in the guest_memfd file backing the guest-provided
VMSA pages?
Guest-provided VMSA pages are placed into the VMSA (Firmware) RMP state.
However, KVM drops its page reference immediately after lookup:
arch/x86/kvm/svm/sev.c:sev_es_vmsa_pa() {
...
kvm_release_page_clean(page);
...
}
If userspace triggers a page release, the page is freed by the core MM
subsystem. During freeing, sev_gmem_invalidate() attempts to transition the
page via rmp_make_shared(), which architecturally fails on VMSA-state pages:
arch/x86/kvm/svm/sev.c:sev_gmem_invalidate() {
...
rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n", pfn, rc))
goto next_pfn;
...
}
Because the failure is only logged as a warning, the page is returned to the
buddy allocator in Firmware state. When the host reallocates and accesses
this page, it triggers a fatal RMP #PF and panics the kernel. KVM must
properly reclaim VMSA pages before allowing them to be freed.
> +
> +static int snp_init_kvm_vmsa(struct kvm_vcpu *vcpu,
> + struct sev_data_snp_launch_update *data,
> + struct kvm_sev_cmd *argp)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> + int ret;
> + void *vmsa;
> +
> + ret = sev_es_sync_vmsa(svm);
> + if (ret)
> + return ret;
> +
> + vmsa = sev_es_vmsa_ref(vcpu);
> +
> + ret = sev_es_vcpu_vmsa_make_private(vcpu);
> + if (ret)
> + return ret;
> +
> + /* Issue the SNP command to encrypt the VMSA */
> + data->address = __sme_pa(vmsa);
> + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> + data, &argp->error);
> + if (ret)
> + sev_snp_vcpu_reclaim_vmsa(vcpu);
> +
> + return ret;
> +}
[Severity: High]
This is a pre-existing issue, but does KVM permanently leak successfully
launched SNP VM VMSA pages during VM teardown?
During normal teardown of an SNP guest, KVM attempts to transition its
allocated VMSA pages to shared state via kvm_rmp_make_shared():
arch/x86/kvm/svm/sev.c:sev_es_vcpu_free_vmsa() {
...
if (kvm_rmp_make_shared(vcpu->kvm, pfn, PG_LEVEL_4K)) {
pr_err("Failed to make VMSA page shared - leaking it to avoid re-use\n");
goto out;
}
...
}
kvm_rmp_make_shared() uses the RMPUPDATE instruction, which architecturally
fails on pages that are in the VMSA state. The function detects the failure,
logs the error, and permanently leaks the page. Because VMs are destroyed
routinely, this will eventually exhaust host memory. KVM must use
snp_page_reclaim() instead of kvm_rmp_make_shared() for pages in the VMSA
state.
> static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
> {
> struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
> @@ -2700,28 +2741,13 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> struct vcpu_svm *svm = to_svm(vcpu);
> - void *vmsa;
>
> - ret = sev_es_sync_vmsa(svm);
> + ret = VALID_PAGE(sev->initial_vmsa_gpa) ?
> + snp_init_guest_vmsa(vcpu, sev->initial_vmsa_gpa) :
> + snp_init_kvm_vmsa(vcpu, &data, argp);
> if (ret)
> goto out;
>
> - vmsa = sev_es_vmsa_ref(vcpu);
> -
> - ret = sev_es_vcpu_vmsa_make_private(vcpu);
> - if (ret)
> - goto out;
> -
> - /* Issue the SNP command to encrypt the VMSA */
> - data.address = __sme_pa(vmsa);
> - ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> - &data, &argp->error);
> - if (ret) {
> - sev_snp_vcpu_reclaim_vmsa(vcpu);
> -
> - goto out;
> - }
> -
> svm->vcpu.arch.guest_state_protected = true;
>
> /* VMSA encrypted - put it into the VMCB */
[Severity: High]
Does this code call sev_es_vmsa_pa() without holding kvm->srcu or
kvm->slots_lock?
Looking at the execution path immediately following this comment in
snp_launch_update_vmsa():
/* VMSA encrypted - put it into the VMCB */
svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(vcpu);
arch/x86/kvm/svm/sev.c:sev_es_vmsa_pa() {
...
slot = gfn_to_memslot(vcpu->kvm, gfn);
...
}
gfn_to_memslot() dereferences the KVM memslots array. Since this is called
without SRCU or slots_lock, a concurrent thread executing
KVM_SET_USER_MEMORY_REGION can swap and free the memslots array while
gfn_to_memslot() is accessing it, leading to a use-after-free.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260611123528.572255-1-joro@8bytes.org?part=3
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE
2026-06-11 12:43 ` Sean Christopherson
@ 2026-06-11 13:23 ` Jörg Rödel
0 siblings, 0 replies; 15+ messages in thread
From: Jörg Rödel @ 2026-06-11 13:23 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, x86, Tom Lendacky, Michael Roth, kvm, linux-kernel,
coconut-svsm, Joerg Roedel
Hi Sean,
On Thu, Jun 11, 2026 at 05:43:05AM -0700, Sean Christopherson wrote:
> On Thu, Jun 11, 2026, Jörg Rödel wrote:
> > From: Joerg Roedel <joerg.roedel@amd.com>
> >
> > Support setting a VMSA in guest physical memory during the SEV-SNP
> > launch process. Only one VMSA can be provided which will then be used
> > for the BSP. All of the APs will not have a VMSA allocated or assigned
> > when this feature is used.
> >
> > This ensures stable launch measurements on SEV-SNP which are
> > independent of the number of VCPUs the VM is launched with.
>
> This needs a *much* longer explanation and more justification for exactly why
> this needs to be handled in KVM. I understand most of the words and acronyms,
> but that's about where my understanding stops.
Sure, how about:
For SEV-SNP VMs KVM currently allocates and measures one VMSA per VCPU into the
initial memory image. Historically this behavior comes from the SEV-ES
implementation, which has no concept of a guest-provided or guest-owned VMSA.
So on SEV-ES there is no other choice than allocating the VMSAs in KVM.
In contrast, on SEV-SNP each VMSA has a GPA assigned and is (in theory)
guest-owned, so that the old SEV-ES behavior of letting KVM manage the
VMSAs causes several problems (especially together with IGVM-loading)
and inefficiencies:
1. With the current KVM behavior the initial launch measurement depends
on the number of VCPUs the VM has assigned.
2. Current SEV-SNP guest code will not use the KVM-allocated VMSAs for
APs. Both EDK2 and the Linux kernel will allocate and provide their
own VMSA pages for every AP. So the current allocation dance KVM is
doing is useless for the APs.
3. The current behavior makes it impossible to implement the
IGVM-promise of a predictable launch measurement derived from only
the IGVM file and the target platform.
To solve these problems this patch adds support to measure an IGVM-provided
VMSA page into the initial SEV-SNP memory image. Only one VMSA page is
supported for now, which aligns with the IGVM requirement that each file can
only provide one VP-context. The VMSA will be checked by KVM for supported SEV
features and VMPL0 before being accepted.
When a VMSA page is measured in this way it will be used as the launch VMSA of
the BSP for the VM. For all other VCPUs KVM will not allocate or measure VMSA
pages, keeping the launch measurement in sync with the IGVM image. The guest
has to provide VMSAs for all APs it intends to use, which common guest
components already do anyway.
When the feature is not used the current behavior is preserved. The changes
have been tested together with the KVM planes patches and COCONUT-SVSM and
showed that using this feature leads to a launch measurement matching the
IGVM-prediction.
?
-Joerg
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs
2026-06-11 12:56 ` sashiko-bot
@ 2026-06-11 14:13 ` Jörg Rödel
0 siblings, 0 replies; 15+ messages in thread
From: Jörg Rödel @ 2026-06-11 14:13 UTC (permalink / raw)
To: sashiko-reviews; +Cc: kvm
On Thu, Jun 11, 2026 at 12:56:28PM +0000, sashiko-bot@kernel.org wrote:
> [Severity: High]
> Does this sequence trigger an RMP violation?
>
> When an SEV-SNP guest provisions an AP and triggers svm_vmgexit_ap_creation(),
> KVM replaces the initial KVM-managed VMSA with the guest-provided one using
> sev_es_set_guest_vmsa(), which eagerly frees the old VMSA via
> sev_es_vcpu_free_vmsa().
>
> For SNP guests, the old VMSA page is in the VMSA_PRIVATE state and has been
> transitioned to the hardware VMSA state. To transition it back to shared
> memory, AMD architecture requires reclaiming the page via snp_page_reclaim()
> before issuing an RMPUPDATE.
Calling snp_page_reclaim() is only needed when the page is reclaimed before
LAUNCH_FINISH. Once the VM is running, which is is when it issues a
svm_vmgexit_ap_creation() request, then an RMPUPDATE is sufficient.
> [Severity: Critical]
> This is a pre-existing issue, but does dropping the page reference here allow
> a malicious guest to crash the host kernel?
>
> When a guest-provided VMSA is used for an AP, KVM resolves the GPA to a PFN
> and immediately drops its page reference here. The physical address is then
> programmed into the hardware VMCB.
>
> If a malicious guest issues a Page State Change request to transition this VMSA
> page back to shared memory, host userspace can handle this by calling
> fallocate(PUNCH_HOLE) on the guest_memfd. This would trigger
> sev_gmem_invalidate(), which attempts rmp_make_shared().
>
> Because the page is still in the VMSA state, rmp_make_shared() would fail.
> sev_gmem_invalidate() only prints a warning and proceeds, freeing the page
> back to the buddy allocator while its RMP entry is still guest-restricted.
>
> When the host buddy allocator reallocates and accesses this page, would it
> trigger an RMP violation and a host kernel panic?
This is not an issue because the RMPUPDATE will only fail when the VMSA is
currently used by a CPU in guest-mode (== its VCPU is running).
But in the sev_es_vcpu_free_vmsa() path this can not happen because this
function is always called in the context of the affecting VCPUs thread, which
makes sure that the VCPU is not in guest-mode.
-Joerg
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage
2026-06-11 12:58 ` sashiko-bot
@ 2026-06-11 14:29 ` Jörg Rödel
0 siblings, 0 replies; 15+ messages in thread
From: Jörg Rödel @ 2026-06-11 14:29 UTC (permalink / raw)
To: sashiko-reviews; +Cc: kvm
On Thu, Jun 11, 2026 at 12:58:13PM +0000, sashiko-bot@kernel.org wrote:
> [Severity: Medium]
> Does this error path leak the newly allocated VMSA?
>
> Since sev_es_sync_vmsa() now allocates the VMSA, if
> sev_es_vcpu_vmsa_make_private() fails, the code returns via the "out" label
> without freeing the VMSA or resetting its state.
>
> If userspace retries the KVM_SEV_SNP_LAUNCH_FINISH ioctl, sev_es_sync_vmsa()
> is called again and will trigger the WARN_ON_ONCE() in
> sev_es_vcpu_alloc_vmsa() because the state is still VMSA_SHARED:
>
> if (WARN_ON_ONCE(svm->sev_es.vmsa.vmsa_state != VMSA_NONE))
> return -EINVAL;
>
> Can we add a call to sev_es_vcpu_free_vmsa() on this error path?
Yes, this is a user-space triggerable WARN_ON_ONCE. Usually user-space does not
re-issue a LAUNCH_FINISHED command after it failed and doing so is not
supported to lead to anything productive anyway. The right fix is to remove the
WARN_ON_ONCE().
> [Severity: Medium]
> Does removing this allocation lead to KVM's INVALID_PAGE sentinel being passed
> to the PSP firmware?
>
> If userspace creates a vCPU after the SEV-ES or SEV-SNP launch sequence has
> completed, the new vCPU's VMSA is never allocated.
>
> When this vCPU is run, sev_es_vmsa_pa() returns KVM's INVALID_PAGE sentinel
> (~0ull), which is loaded into the VMCB, triggering an immediate #VMEXIT
> (SVM_EXIT_ERR).
>
> If the dump_invalid_vmcb parameter is enabled, KVM handles this by calling
> dump_vmcb(), which then calls sev_decrypt_vmsa(). Since sev_es_vmsa_ref()
> returns NULL for VMSA_NONE, sev_decrypt_vmsa() erroneously assumes the VMSA
> is encrypted and passes the INVALID_PAGE sentinel directly to the PSP
> firmware as the source physical address for SEV_CMD_DBG_DECRYPT:
>
> data.src_addr = svm->vmcb->control.vmsa_pa;
> data.len = PAGE_SIZE;
>
> ret = sev_issue_cmd(vcpu->kvm, SEV_CMD_DBG_DECRYPT, &data, &argp_err);
>
> Could this direct hardware exposure to an invalid page address risk firmware
> hangs or IOMMU faults?
This is not true. The VMSAs are allocated in sev_es_sync_vmsa() for both SEV-ES
and SEV-SNP guests. Both code path also propagate the VMSA into the VMCB before
LAUNCH_FINISH is complete and the VM can be invoked.
-Joerg
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching
2026-06-11 13:05 ` sashiko-bot
@ 2026-06-11 14:43 ` Jörg Rödel
0 siblings, 0 replies; 15+ messages in thread
From: Jörg Rödel @ 2026-06-11 14:43 UTC (permalink / raw)
To: sashiko-reviews; +Cc: kvm
On Thu, Jun 11, 2026 at 01:05:36PM +0000, sashiko-bot@kernel.org wrote:
> [Severity: Critical]
> This is a pre-existing issue, but can unprivileged userspace panic the host
> kernel by punching a hole in the guest_memfd file backing the guest-provided
> VMSA pages?
>
> Guest-provided VMSA pages are placed into the VMSA (Firmware) RMP state.
> However, KVM drops its page reference immediately after lookup:
>
> arch/x86/kvm/svm/sev.c:sev_es_vmsa_pa() {
> ...
> kvm_release_page_clean(page);
> ...
> }
There is at least not a reference counting issue here, as GMEM still holds a
reference to the page.
> If userspace triggers a page release, the page is freed by the core MM
> subsystem. During freeing, sev_gmem_invalidate() attempts to transition the
> page via rmp_make_shared(), which architecturally fails on VMSA-state pages:
Not sure about that, I havn't checked whether GMEM allows these kinds of
page-state changes before LAUNCH_FINISH. After LAUNCH_FINISH it is not problem
anymore because VMSA address in the VMCB does not point to a valid VMSA page,
then VMRUN will fail.
> [Severity: High]
> This is a pre-existing issue, but does KVM permanently leak successfully
> launched SNP VM VMSA pages during VM teardown?
>
> During normal teardown of an SNP guest, KVM attempts to transition its
> allocated VMSA pages to shared state via kvm_rmp_make_shared():
>
> arch/x86/kvm/svm/sev.c:sev_es_vcpu_free_vmsa() {
> ...
> if (kvm_rmp_make_shared(vcpu->kvm, pfn, PG_LEVEL_4K)) {
> pr_err("Failed to make VMSA page shared - leaking it to avoid re-use\n");
> goto out;
> }
> ...
> }
>
> kvm_rmp_make_shared() uses the RMPUPDATE instruction, which architecturally
> fails on pages that are in the VMSA state. The function detects the failure,
> logs the error, and permanently leaks the page. Because VMs are destroyed
> routinely, this will eventually exhaust host memory. KVM must use
> snp_page_reclaim() instead of kvm_rmp_make_shared() for pages in the VMSA
> state.
Wrong, VMSA pages are freed when they are KVM owned, when they are guest owned
they are freed via gmem. This is all fine if it happens after LAUNCH_FINISH and
no VCPU of the VM is still running.
> [Severity: High]
> Does this code call sev_es_vmsa_pa() without holding kvm->srcu or
> kvm->slots_lock?
>
> Looking at the execution path immediately following this comment in
> snp_launch_update_vmsa():
>
> /* VMSA encrypted - put it into the VMCB */
> svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(vcpu);
>
> arch/x86/kvm/svm/sev.c:sev_es_vmsa_pa() {
> ...
> slot = gfn_to_memslot(vcpu->kvm, gfn);
> ...
> }
>
> gfn_to_memslot() dereferences the KVM memslots array. Since this is called
> without SRCU or slots_lock, a concurrent thread executing
> KVM_SET_USER_MEMORY_REGION can swap and free the memslots array while
> gfn_to_memslot() is accessing it, leading to a use-after-free.
Not sure if this is an issue. If it is then it is pre-existing. I think the
worst that can happen is a failed VMRUN because the VMSAs become invalid, no?
-Joerg
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE
2026-06-11 12:58 ` sashiko-bot
@ 2026-06-11 15:23 ` Jörg Rödel
0 siblings, 0 replies; 15+ messages in thread
From: Jörg Rödel @ 2026-06-11 15:23 UTC (permalink / raw)
To: sashiko-reviews; +Cc: kvm
On Thu, Jun 11, 2026 at 12:58:49PM +0000, sashiko-bot@kernel.org wrote:
> [Severity: Low]
> Is the sev_features variable necessary? It gets assigned below when
> processing a KVM_SEV_SNP_PAGE_TYPE_VMSA page but doesn't appear to be
> used anywhere else in this function.
Yes, that is a left-over and can be removed.
> [Severity: High]
> This is a pre-existing issue, but does this violate the LIFO unmapping
> requirement of the highmem API?
>
> The kmap_local_page() documentation requires mappings to be unmapped in the
> exact reverse order. Here src_vaddr is mapped first, then dst_vaddr, but
> they are unmapped in the same order instead of reverse. Could this corrupt
> the internal stack on the next kmap/kunmap operations?
Right, I will fix it in this patch.
> [Severity: High]
> Will this new VMSA initialization path expose a use-after-free of KVM
> memslots during launch finish?
>
> By setting sev->initial_vmsa_gpa to a valid page here, the ioctl path
> KVM_SEV_SNP_LAUNCH_FINISH can now call sev_es_vmsa_pa(), which looks up the
> slot:
>
> arch/x86/kvm/svm/sev.c:sev_es_vmsa_pa() {
> ...
> slot = gfn_to_memslot(vcpu->kvm, gfn);
> ...
> }
>
> Since the launch finish ioctl takes kvm->lock and vcpu->mutex, but not
> srcu_read_lock(), could a concurrent memslot update swap and free the
> memslot before it is accessed in kvm_gmem_get_pfn()?
Will further investigate this. If there is an issue it is independent of this
series.
> [Severity: Critical]
> This isn't a bug introduced by this patch, but does the error recovery path
> for KVM_SEV_SNP_PAGE_TYPE_CPUID write to a potentially read-only page?
>
> When handling the CPUID type, kvm_gmem_populate() calls get_user_pages_fast()
> with gup_flags = 0, requesting read-only access. If userspace mapped a
> read-only page like the global zero page, GUP will succeed.
>
> If the firmware returns SEV_RET_INVALID_PARAM, sev_gmem_post_populate() maps
> the page with kmap_local_page() and copies the expected values into it:
>
> memcpy(src_vaddr, dst_vaddr, PAGE_SIZE);
>
> Could this illegally overwrite read-only memory and corrupt the global zero
> page?
Looks like this can happen, but not introduced by this series. It needs a
separate fix.
> [Severity: High]
> This is a pre-existing issue, but can a host userspace process trigger a
> kernel warning by passing a NULL user address (uaddr = 0) here?
>
> If params.uaddr is 0, src becomes NULL and passes the PAGE_ALIGNED(src)
> check. kvm_gmem_populate() skips fetching the user page and passes
> src_page = NULL to sev_gmem_post_populate().
>
> That function then unconditionally evaluates:
>
> WARN_ON_ONCE(sev_populate_args->type != KVM_SEV_SNP_PAGE_TYPE_ZERO &&
> !src_page)
>
> Since the type isn't ZERO, won't this allow an unprivileged user to spam
> the kernel log?
This is correct, we will send a separate fix.
-Joerg
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-06-11 15:23 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 12:35 [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests Jörg Rödel
2026-06-11 12:35 ` [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs Jörg Rödel
2026-06-11 12:56 ` sashiko-bot
2026-06-11 14:13 ` Jörg Rödel
2026-06-11 12:35 ` [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage Jörg Rödel
2026-06-11 12:58 ` sashiko-bot
2026-06-11 14:29 ` Jörg Rödel
2026-06-11 12:35 ` [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching Jörg Rödel
2026-06-11 13:05 ` sashiko-bot
2026-06-11 14:43 ` Jörg Rödel
2026-06-11 12:35 ` [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE Jörg Rödel
2026-06-11 12:43 ` Sean Christopherson
2026-06-11 13:23 ` Jörg Rödel
2026-06-11 12:58 ` sashiko-bot
2026-06-11 15:23 ` Jörg Rödel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox