* [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests
@ 2026-06-11 12:35 Jörg Rödel
2026-06-11 12:35 ` [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs Jörg Rödel
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Jörg Rödel @ 2026-06-11 12:35 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm,
Joerg Roedel
From: Joerg Roedel <joerg.roedel@amd.com>
Hi,
Here is a set of patches to support the VMM to provide a VMSA directly
to KVM which will then be used by the BSP of the SEV-SNP VM. The
use-case is IGVM loading, where the IGVM file contains a VMSA image
which must be loaded into the initial memory image of the VM as-is to
guarantee the expected launch measurement.
The first patch re-works guest-VMSA handling and streamlines the state
handling to make it more clear and maintainable. That patch accounts
for the biggest part if the changes.
I have tested these changes together with the planes patches and
COCONUT-SVSM and can confirm that the launch measurment is correct
again with these changes.
The changes are based on previous work by Roy Hopkins[1].
Please review.
Thanks,
Joerg
[1] https://github.com/torvalds/linux/commit/e00e081276b2cd9f1400ec5b1a9cd97f8b5c4d58
Joerg Roedel (4):
kvm: svm: Streamline VMSA setting for VCPUs
kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage
kvm: svm: Support guest-provided VMSA for launching
kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/svm/sev.c | 419 +++++++++++++++++++++++---------
arch/x86/kvm/svm/svm.h | 32 ++-
arch/x86/kvm/x86.c | 1 +
include/uapi/linux/kvm.h | 1 +
5 files changed, 337 insertions(+), 117 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs 2026-06-11 12:35 [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests Jörg Rödel @ 2026-06-11 12:35 ` Jörg Rödel 2026-06-16 20:52 ` Tom Lendacky 2026-06-11 12:35 ` [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage Jörg Rödel ` (2 subsequent siblings) 3 siblings, 1 reply; 12+ messages in thread From: Jörg Rödel @ 2026-06-11 12:35 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel From: Joerg Roedel <joerg.roedel@amd.com> Streamline the VMSA setting state of vcpus, where a VMSA can be either KVM-allocated or guest-provided. This consolidates the various tracking state around VMSAs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> --- arch/x86/kvm/svm/sev.c | 301 ++++++++++++++++++++++++++++------------- arch/x86/kvm/svm/svm.h | 31 ++++- 2 files changed, 237 insertions(+), 95 deletions(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 6c6a6d663e29..9b1280222e20 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -147,6 +147,9 @@ static bool sev_snp_guest(struct kvm *kvm) } static int snp_decommission_context(struct kvm *kvm); +static int kvm_rmp_make_shared(struct kvm *kvm, u64 pfn, enum pg_level level); +static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va); +static int snp_page_reclaim(struct kvm *kvm, u64 pfn); struct enc_region { struct list_head list; @@ -156,6 +159,173 @@ struct enc_region { unsigned long size; }; +static void *sev_es_vmsa_ref(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + void *vmsa = NULL; + + if (svm->sev_es.vmsa.vmsa_state == VMSA_SHARED) { + vmsa = page_address(svm->sev_es.vmsa.vmsa_page); + } + + return vmsa; +} + +static int sev_es_vcpu_alloc_vmsa(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + struct page *vmsa_page; + + if (WARN_ON_ONCE(svm->sev_es.vmsa.vmsa_state != VMSA_NONE)) + return -EINVAL; + + /* + * SEV-ES guests require a separate (from the VMCB) VMSA page used to + * contain the encrypted register state of the guest. + */ + vmsa_page = snp_safe_alloc_page(); + if (!vmsa_page) + return -ENOMEM; + + svm->sev_es.vmsa.vmsa_state = VMSA_SHARED; + svm->sev_es.vmsa.vmsa_page = vmsa_page; + + return 0; +} + +static int sev_es_vcpu_vmsa_make_private(struct kvm_vcpu *vcpu) +{ + struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm); + struct vcpu_svm *svm = to_svm(vcpu); + void *vmsa = sev_es_vmsa_ref(vcpu); + + if (!vmsa) + return -EINVAL; + + if (is_sev_snp_guest(vcpu)) { + u64 pfn = __pa(vmsa) >> PAGE_SHIFT; + int ret; + + /* Transition the VMSA page to a firmware state. */ + ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true); + if (ret) + return ret; + } + + svm->sev_es.vmsa.vmsa_state = VMSA_PRIVATE; + + return 0; +} + +static void sev_es_vcpu_free_vmsa(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + void *vmsa_ptr; + + switch (svm->sev_es.vmsa.vmsa_state) { + case VMSA_NONE: + case VMSA_GUEST: + break; + case VMSA_PRIVATE: + vmsa_ptr = page_address(svm->sev_es.vmsa.vmsa_page); + + if (is_sev_snp_guest(vcpu)) { + u64 pfn = __pa(vmsa_ptr) >> PAGE_SHIFT; + + if (kvm_rmp_make_shared(vcpu->kvm, pfn, PG_LEVEL_4K)) { + pr_err("Failed to make VMSA page shared - leaking it to avoid re-use\n"); + goto out; + } + } + + if (vcpu->arch.guest_state_protected) + sev_flush_encrypted_page(vcpu, vmsa_ptr); + + fallthrough; + case VMSA_SHARED: + __free_page(svm->sev_es.vmsa.vmsa_page); + break; + default: + BUG(); + } +out: + + svm->sev_es.vmsa.vmsa_page = NULL; + svm->sev_es.vmsa.vmsa_state = VMSA_NONE; +} + +static void sev_snp_vcpu_reclaim_vmsa(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + void *vmsa_ptr; + u64 pfn; + + if (WARN_ON_ONCE(!is_sev_snp_guest(vcpu) || + svm->sev_es.vmsa.vmsa_state != VMSA_PRIVATE)) + return; + + vmsa_ptr = page_address(svm->sev_es.vmsa.vmsa_page); + pfn = __pa(vmsa_ptr) >> PAGE_SHIFT; + + if (!snp_page_reclaim(vcpu->kvm, pfn)) + __free_page(svm->sev_es.vmsa.vmsa_page); + + svm->sev_es.vmsa.vmsa_page = NULL; + svm->sev_es.vmsa.vmsa_state = VMSA_NONE; +} + +static void sev_es_set_guest_vmsa(struct kvm_vcpu *vcpu, gpa_t vmsa_gpa) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + sev_es_vcpu_free_vmsa(vcpu); + + svm->sev_es.vmsa.vmsa_state = VMSA_GUEST; + svm->sev_es.vmsa.vmsa_gpa = vmsa_gpa; +} + +static u64 sev_es_vmsa_pa(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + enum vmsa_state vmsa_state = svm->sev_es.vmsa.vmsa_state; + u64 vmsa_pa = INVALID_PAGE; + + if (vmsa_state == VMSA_GUEST) { + gpa_t vmsa_gpa = svm->sev_es.vmsa.vmsa_gpa; + struct kvm_memory_slot *slot; + struct page *page; + kvm_pfn_t pfn; + gfn_t gfn; + + gfn = gpa_to_gfn(vmsa_gpa); + + slot = gfn_to_memslot(vcpu->kvm, gfn); + if (!slot) + goto out; + + /* + * The new VMSA will be private memory guest memory, so retrieve the + * PFN from the gmem backend. + */ + if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL)) + goto out; + + vmsa_pa = pfn_to_hpa(pfn); + + /* + * gmem pages aren't currently migratable, but if this ever changes + * then care should be taken to ensure the guest vmsa is pinned + * through some other means. + */ + kvm_release_page_clean(page); + } else if (vmsa_state == VMSA_PRIVATE || vmsa_state == VMSA_SHARED) { + vmsa_pa = __pa(page_address(svm->sev_es.vmsa.vmsa_page)); + } + +out: + return vmsa_pa; +} + /* Called with the sev_bitmap_lock held, or on shutdown */ static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid) { @@ -925,7 +1095,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm) { struct kvm_vcpu *vcpu = &svm->vcpu; struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm); - struct sev_es_save_area *save = svm->sev_es.vmsa; + struct sev_es_save_area *save = sev_es_vmsa_ref(vcpu); struct xregs_state *xsave; const u8 *s; u8 *d; @@ -1026,6 +1196,7 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, { struct sev_data_launch_update_vmsa vmsa; struct vcpu_svm *svm = to_svm(vcpu); + void *vmsa_ref = sev_es_vmsa_ref(vcpu); int ret; if (vcpu->guest_debug) { @@ -1043,15 +1214,19 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, * the VMSA memory content (i.e it will write the same memory region * with the guest's key), so invalidate it first. */ - clflush_cache_range(svm->sev_es.vmsa, PAGE_SIZE); + clflush_cache_range(vmsa_ref, PAGE_SIZE); vmsa.reserved = 0; vmsa.handle = to_kvm_sev_info(kvm)->handle; - vmsa.address = __sme_pa(svm->sev_es.vmsa); + vmsa.address = __sme_pa(vmsa_ref); vmsa.len = PAGE_SIZE; ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_VMSA, &vmsa, error); if (ret) - return ret; + goto free_vmsa; + + ret = sev_es_vcpu_vmsa_make_private(vcpu); + if (ret) + goto free_vmsa; /* * SEV-ES guests maintain an encrypted version of their FPU @@ -1069,7 +1244,13 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, * MSR_IA32_DEBUGCTLMSR when guest_state_protected is not set. */ svm_enable_lbrv(vcpu); + return 0; + +free_vmsa: + sev_es_vcpu_free_vmsa(vcpu); + + return ret; } static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) @@ -2508,23 +2689,22 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) kvm_for_each_vcpu(i, vcpu, kvm) { struct vcpu_svm *svm = to_svm(vcpu); - u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT; + void *vmsa = sev_es_vmsa_ref(vcpu); ret = sev_es_sync_vmsa(svm); if (ret) goto out; - /* Transition the VMSA page to a firmware state. */ - ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true); + ret = sev_es_vcpu_vmsa_make_private(vcpu); if (ret) goto out; /* Issue the SNP command to encrypt the VMSA */ - data.address = __sme_pa(svm->sev_es.vmsa); + data.address = __sme_pa(vmsa); ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, &data, &argp->error); if (ret) { - snp_page_reclaim(kvm, pfn); + sev_snp_vcpu_reclaim_vmsa(vcpu); goto out; } @@ -3593,31 +3773,13 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm) void sev_free_vcpu(struct kvm_vcpu *vcpu) { - struct vcpu_svm *svm; + struct vcpu_svm *svm = to_svm(vcpu); if (!is_sev_es_guest(vcpu)) return; - svm = to_svm(vcpu); - - /* - * If it's an SNP guest, then the VMSA was marked in the RMP table as - * a guest-owned page. Transition the page to hypervisor state before - * releasing it back to the system. - */ - if (is_sev_snp_guest(vcpu)) { - u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT; - - if (kvm_rmp_make_shared(vcpu->kvm, pfn, PG_LEVEL_4K)) - goto skip_vmsa_free; - } - - if (vcpu->arch.guest_state_protected) - sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa); - - __free_page(virt_to_page(svm->sev_es.vmsa)); + sev_es_vcpu_free_vmsa(vcpu); -skip_vmsa_free: __sev_es_unmap_ghcb(svm); } @@ -4067,10 +4229,7 @@ static int snp_begin_psc(struct vcpu_svm *svm) static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); - struct kvm_memory_slot *slot; - struct page *page; - kvm_pfn_t pfn; - gfn_t gfn; + u64 vmsa_pa; guard(mutex)(&svm->sev_es.snp_vmsa_mutex); @@ -4092,46 +4251,17 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) */ vmcb_mark_all_dirty(svm->vmcb); - if (!VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) - return; - - gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa); - svm->sev_es.snp_vmsa_gpa = INVALID_PAGE; + sev_es_set_guest_vmsa(vcpu, svm->sev_es.req_vmsa_gpa); + vmsa_pa = sev_es_vmsa_pa(vcpu); - slot = gfn_to_memslot(vcpu->kvm, gfn); - if (!slot) + if (!VALID_PAGE(vmsa_pa)) return; - /* - * The new VMSA will be private memory guest memory, so retrieve the - * PFN from the gmem backend. - */ - if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL)) - return; - - /* - * From this point forward, the VMSA will always be a guest-mapped page - * rather than the initial one allocated by KVM in svm->sev_es.vmsa. In - * theory, svm->sev_es.vmsa could be free'd and cleaned up here, but - * that involves cleanups like flushing caches, which would ideally be - * handled during teardown rather than guest boot. Deferring that also - * allows the existing logic for SEV-ES VMSAs to be re-used with - * minimal SNP-specific changes. - */ - svm->sev_es.snp_has_guest_vmsa = true; - /* Use the new VMSA */ - svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn); + svm->vmcb->control.vmsa_pa = vmsa_pa; /* Mark the vCPU as runnable */ kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE); - - /* - * gmem pages aren't currently migratable, but if this ever changes - * then care should be taken to ensure svm->sev_es.vmsa is pinned - * through some other means. - */ - kvm_release_page_clean(page); } static int sev_snp_ap_creation(struct vcpu_svm *svm) @@ -4187,10 +4317,10 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm) return -EINVAL; } - target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2; + target_svm->sev_es.req_vmsa_gpa = svm->vmcb->control.exit_info_2; break; case SVM_VMGEXIT_AP_DESTROY: - target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE; + target_svm->sev_es.req_vmsa_gpa = INVALID_PAGE; break; default: vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n", @@ -4708,20 +4838,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm, bool init_event) struct vmcb *vmcb = svm->vmcb01.ptr; svm->vmcb->control.misc_ctl |= SVM_MISC_ENABLE_SEV_ES; - - /* - * An SEV-ES guest requires a VMSA area that is a separate from the - * VMCB page. Do not include the encryption mask on the VMSA physical - * address since hardware will access it using the guest key. Note, - * the VMSA will be NULL if this vCPU is the destination for intrahost - * migration, and will be copied later. - */ - if (!svm->sev_es.snp_has_guest_vmsa) { - if (svm->sev_es.vmsa) - svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa); - else - svm->vmcb->control.vmsa_pa = INVALID_PAGE; - } + svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(&svm->vcpu); if (cpu_feature_enabled(X86_FEATURE_ALLOWED_SEV_FEATURES)) svm->vmcb->control.allowed_sev_features = sev->vmsa_features | @@ -4797,7 +4914,7 @@ void sev_init_vmcb(struct vcpu_svm *svm, bool init_event) int sev_vcpu_create(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); - struct page *vmsa_page; + int ret; mutex_init(&svm->sev_es.snp_vmsa_mutex); @@ -4808,11 +4925,9 @@ int sev_vcpu_create(struct kvm_vcpu *vcpu) * SEV-ES guests require a separate (from the VMCB) VMSA page used to * contain the encrypted register state of the guest. */ - vmsa_page = snp_safe_alloc_page(); - if (!vmsa_page) - return -ENOMEM; - - svm->sev_es.vmsa = page_address(vmsa_page); + ret = sev_es_vcpu_alloc_vmsa(vcpu); + if (ret) + return ret; vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm); @@ -5227,12 +5342,14 @@ struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu) if (!is_sev_es_guest(vcpu)) return NULL; + vmsa = sev_es_vmsa_ref(vcpu); + /* * If the VMSA has not yet been encrypted, return a pointer to the * current un-encrypted VMSA. */ - if (!vcpu->arch.guest_state_protected) - return (struct vmcb_save_area *)svm->sev_es.vmsa; + if (vmsa) + return vmsa; sev = to_kvm_sev_info(vcpu->kvm); @@ -5303,8 +5420,10 @@ struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu) void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa) { + struct vmcb_save_area *vmsa_ptr = sev_es_vmsa_ref(vcpu); + /* If the VMSA has not yet been encrypted, nothing was allocated */ - if (!vcpu->arch.guest_state_protected || !vmsa) + if (vmsa == vmsa_ptr) return; free_page((unsigned long)vmsa); diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 5137416be593..3d4799f09b23 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -240,9 +240,29 @@ struct svm_nested_state { bool force_msr_bitmap_recalc; }; +enum vmsa_state { + /* No VMSA set */ + VMSA_NONE, + /* VMSA allocated by KVM - Shared in RMP (if applicable) */ + VMSA_SHARED, + /* VMSA allocated by KVM - Guest-private in RMP (SEV-SNP only) */ + VMSA_PRIVATE, + /* Guest-owned VMSA */ + VMSA_GUEST, +}; + +struct sev_es_vmsa_state { + enum vmsa_state vmsa_state; + union { + /* state == (KVM_SHARED || KVM_PRIVATE) */ + struct page *vmsa_page; + /* state == GUEST */ + gpa_t vmsa_gpa; + }; +}; + struct vcpu_sev_es_state { /* SEV-ES support */ - struct sev_es_save_area *vmsa; struct ghcb *ghcb; u8 valid_bitmap[16]; struct kvm_host_map ghcb_map; @@ -266,10 +286,13 @@ struct vcpu_sev_es_state { u64 ghcb_registered_gpa; - struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */ - gpa_t snp_vmsa_gpa; + /* VMSA related state */ + struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */ + struct sev_es_vmsa_state vmsa; /* VMSA currently used by the VCPU */ + gpa_t req_vmsa_gpa; /* Requested new VMSA GPA */ + + bool snp_ap_runnable; bool snp_ap_waiting_for_reset; - bool snp_has_guest_vmsa; }; struct vcpu_svm { -- 2.53.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs 2026-06-11 12:35 ` [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs Jörg Rödel @ 2026-06-16 20:52 ` Tom Lendacky 0 siblings, 0 replies; 12+ messages in thread From: Tom Lendacky @ 2026-06-16 20:52 UTC (permalink / raw) To: Jörg Rödel, Sean Christopherson, Paolo Bonzini Cc: x86, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel On 6/11/26 07:35, Jörg Rödel wrote: > From: Joerg Roedel <joerg.roedel@amd.com> > > Streamline the VMSA setting state of vcpus, where a VMSA can be either > KVM-allocated or guest-provided. This consolidates the various > tracking state around VMSAs. > > Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> > --- > arch/x86/kvm/svm/sev.c | 301 ++++++++++++++++++++++++++++------------- > arch/x86/kvm/svm/svm.h | 31 ++++- > 2 files changed, 237 insertions(+), 95 deletions(-) > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 6c6a6d663e29..9b1280222e20 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -147,6 +147,9 @@ static bool sev_snp_guest(struct kvm *kvm) > } > > static int snp_decommission_context(struct kvm *kvm); > +static int kvm_rmp_make_shared(struct kvm *kvm, u64 pfn, enum pg_level level); > +static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va); > +static int snp_page_reclaim(struct kvm *kvm, u64 pfn); Can this be worked so that we don't add more forward declarations? > > struct enc_region { > struct list_head list; > @@ -156,6 +159,173 @@ struct enc_region { > unsigned long size; > }; > > +static void *sev_es_vmsa_ref(struct kvm_vcpu *vcpu) sev_es_get_kvm_shared_vmsa ? > +{ > + struct vcpu_svm *svm = to_svm(vcpu); > + void *vmsa = NULL; > + > + if (svm->sev_es.vmsa.vmsa_state == VMSA_SHARED) { > + vmsa = page_address(svm->sev_es.vmsa.vmsa_page); > + } > + > + return vmsa; How about just: if (svm->sev_es.vmsa.vmsa_state == VMSA_SHARED) return page_address(svm->sev_es.vmsa.vmsa_page); return NULL; > +} > + > +static int sev_es_vcpu_alloc_vmsa(struct kvm_vcpu *vcpu) sev_es_alloc_kvm_vmsa ? > +{ > + struct vcpu_svm *svm = to_svm(vcpu); > + struct page *vmsa_page; > + > + if (WARN_ON_ONCE(svm->sev_es.vmsa.vmsa_state != VMSA_NONE)) > + return -EINVAL; > + > + /* > + * SEV-ES guests require a separate (from the VMCB) VMSA page used to > + * contain the encrypted register state of the guest. > + */ > + vmsa_page = snp_safe_alloc_page(); > + if (!vmsa_page) > + return -ENOMEM; > + > + svm->sev_es.vmsa.vmsa_state = VMSA_SHARED; > + svm->sev_es.vmsa.vmsa_page = vmsa_page; > + > + return 0; > +} > + > +static int sev_es_vcpu_vmsa_make_private(struct kvm_vcpu *vcpu) sev_es_make_kvm_vmsa_private ? > +{ > + struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm); > + struct vcpu_svm *svm = to_svm(vcpu); > + void *vmsa = sev_es_vmsa_ref(vcpu); > + > + if (!vmsa) > + return -EINVAL; > + > + if (is_sev_snp_guest(vcpu)) { > + u64 pfn = __pa(vmsa) >> PAGE_SHIFT; > + int ret; > + > + /* Transition the VMSA page to a firmware state. */ > + ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true); > + if (ret) > + return ret; > + } > + > + svm->sev_es.vmsa.vmsa_state = VMSA_PRIVATE; > + > + return 0; > +} > + > +static void sev_es_vcpu_free_vmsa(struct kvm_vcpu *vcpu) sev_es_free_kvm_vmsa ? > +{ > + struct vcpu_svm *svm = to_svm(vcpu); > + void *vmsa_ptr; s/vmsa_ptr/vmsa/ > + > + switch (svm->sev_es.vmsa.vmsa_state) { > + case VMSA_NONE: > + case VMSA_GUEST: > + break; > + case VMSA_PRIVATE: > + vmsa_ptr = page_address(svm->sev_es.vmsa.vmsa_page); > + > + if (is_sev_snp_guest(vcpu)) { > + u64 pfn = __pa(vmsa_ptr) >> PAGE_SHIFT; PHYS_PFN(__pa(vmsa)); > + > + if (kvm_rmp_make_shared(vcpu->kvm, pfn, PG_LEVEL_4K)) { > + pr_err("Failed to make VMSA page shared - leaking it to avoid re-use\n"); > + goto out; s/goto out/break/ > + } > + } > + > + if (vcpu->arch.guest_state_protected) > + sev_flush_encrypted_page(vcpu, vmsa_ptr); > + > + fallthrough; > + case VMSA_SHARED: > + __free_page(svm->sev_es.vmsa.vmsa_page); > + break; > + default: > + BUG(); WARN_ON or WARN_ON_ONCE() instead of BUG(). > + } > +out: If using the 'break' above, then no need for this label. > + > + svm->sev_es.vmsa.vmsa_page = NULL; > + svm->sev_es.vmsa.vmsa_state = VMSA_NONE; > +} > + > +static void sev_snp_vcpu_reclaim_vmsa(struct kvm_vcpu *vcpu) sev_snp_reclaim_kvm_vmsa ? > +{ > + struct vcpu_svm *svm = to_svm(vcpu); > + void *vmsa_ptr; s/vmsa_ptr/vmsa/ > + u64 pfn; > + > + if (WARN_ON_ONCE(!is_sev_snp_guest(vcpu) || > + svm->sev_es.vmsa.vmsa_state != VMSA_PRIVATE)) > + return; > + > + vmsa_ptr = page_address(svm->sev_es.vmsa.vmsa_page); > + pfn = __pa(vmsa_ptr) >> PAGE_SHIFT; PHYS_PFN() > + > + if (!snp_page_reclaim(vcpu->kvm, pfn)) > + __free_page(svm->sev_es.vmsa.vmsa_page); Should you issue a message here similar to above about leaking the page if snp_page_reclaim() fails? > + > + svm->sev_es.vmsa.vmsa_page = NULL; > + svm->sev_es.vmsa.vmsa_state = VMSA_NONE; > +} > + > +static void sev_es_set_guest_vmsa(struct kvm_vcpu *vcpu, gpa_t vmsa_gpa) > +{ > + struct vcpu_svm *svm = to_svm(vcpu); > + > + sev_es_vcpu_free_vmsa(vcpu); > + > + svm->sev_es.vmsa.vmsa_state = VMSA_GUEST; > + svm->sev_es.vmsa.vmsa_gpa = vmsa_gpa; > +} > + > +static u64 sev_es_vmsa_pa(struct kvm_vcpu *vcpu) > +{ > + struct vcpu_svm *svm = to_svm(vcpu); > + enum vmsa_state vmsa_state = svm->sev_es.vmsa.vmsa_state; > + u64 vmsa_pa = INVALID_PAGE; > + > + if (vmsa_state == VMSA_GUEST) { Would a switch statement like you have in sev_es_vcpu_free_vmsa() be more consistent? Also, you could return INVALID_PAGE directly instead of having the goto's. switch (svm->sev_es.vmsa.vmsa_state) { case VMSA_NONE: return INVALID_PAGE; case VMSA_SHARED: case VMSA_PRIVATE: return __pa(page_address(svm->sev_es.vmsa.vmsa_page)); case VMSA_GUEST: ... } > + gpa_t vmsa_gpa = svm->sev_es.vmsa.vmsa_gpa; > + struct kvm_memory_slot *slot; > + struct page *page; > + kvm_pfn_t pfn; > + gfn_t gfn; > + > + gfn = gpa_to_gfn(vmsa_gpa); > + > + slot = gfn_to_memslot(vcpu->kvm, gfn); > + if (!slot) > + goto out; > + > + /* > + * The new VMSA will be private memory guest memory, so retrieve the > + * PFN from the gmem backend. > + */ > + if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL)) > + goto out; > + > + vmsa_pa = pfn_to_hpa(pfn); > + > + /* > + * gmem pages aren't currently migratable, but if this ever changes > + * then care should be taken to ensure the guest vmsa is pinned > + * through some other means. > + */ > + kvm_release_page_clean(page); > + } else if (vmsa_state == VMSA_PRIVATE || vmsa_state == VMSA_SHARED) { > + vmsa_pa = __pa(page_address(svm->sev_es.vmsa.vmsa_page)); > + } > + > +out: > + return vmsa_pa; > +} > + > /* Called with the sev_bitmap_lock held, or on shutdown */ > static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid) > { > @@ -925,7 +1095,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm) > { > struct kvm_vcpu *vcpu = &svm->vcpu; > struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm); > - struct sev_es_save_area *save = svm->sev_es.vmsa; > + struct sev_es_save_area *save = sev_es_vmsa_ref(vcpu); > struct xregs_state *xsave; > const u8 *s; > u8 *d; > @@ -1026,6 +1196,7 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, > { > struct sev_data_launch_update_vmsa vmsa; > struct vcpu_svm *svm = to_svm(vcpu); > + void *vmsa_ref = sev_es_vmsa_ref(vcpu); Might make more sense rename the sev_data_launch_update_vmsa variable to vmsa_lu and keep this as just vmsa ?\ > int ret; > > if (vcpu->guest_debug) { > @@ -1043,15 +1214,19 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, > * the VMSA memory content (i.e it will write the same memory region > * with the guest's key), so invalidate it first. > */ > - clflush_cache_range(svm->sev_es.vmsa, PAGE_SIZE); > + clflush_cache_range(vmsa_ref, PAGE_SIZE); > > vmsa.reserved = 0; > vmsa.handle = to_kvm_sev_info(kvm)->handle; > - vmsa.address = __sme_pa(svm->sev_es.vmsa); > + vmsa.address = __sme_pa(vmsa_ref); > vmsa.len = PAGE_SIZE; > ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_VMSA, &vmsa, error); > if (ret) > - return ret; > + goto free_vmsa; > + > + ret = sev_es_vcpu_vmsa_make_private(vcpu); > + if (ret) > + goto free_vmsa; Similar to the SNP path, you can probably move this to before the LAUNCH_UPDATE, just for consistency. > > /* > * SEV-ES guests maintain an encrypted version of their FPU > @@ -1069,7 +1244,13 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, > * MSR_IA32_DEBUGCTLMSR when guest_state_protected is not set. > */ > svm_enable_lbrv(vcpu); > + > return 0; > + > +free_vmsa: > + sev_es_vcpu_free_vmsa(vcpu); > + > + return ret; > } > > static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) > @@ -2508,23 +2689,22 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) > > kvm_for_each_vcpu(i, vcpu, kvm) { > struct vcpu_svm *svm = to_svm(vcpu); > - u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT; > + void *vmsa = sev_es_vmsa_ref(vcpu); > > ret = sev_es_sync_vmsa(svm); > if (ret) > goto out; > > - /* Transition the VMSA page to a firmware state. */ > - ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true); > + ret = sev_es_vcpu_vmsa_make_private(vcpu); > if (ret) > goto out; > > /* Issue the SNP command to encrypt the VMSA */ > - data.address = __sme_pa(svm->sev_es.vmsa); > + data.address = __sme_pa(vmsa); > ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, > &data, &argp->error); > if (ret) { > - snp_page_reclaim(kvm, pfn); > + sev_snp_vcpu_reclaim_vmsa(vcpu); > > goto out; > } It might be nice to break the loop contents out into its own function like is done with ES. I haven't looked ahead in the series to see if that would make it easier to supply/measure a single VMSA or not, though. > @@ -3593,31 +3773,13 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm) > > void sev_free_vcpu(struct kvm_vcpu *vcpu) > { > - struct vcpu_svm *svm; > + struct vcpu_svm *svm = to_svm(vcpu); You can probably reduce the churn here by keeping this line unchanged and not deleting the assignment below. > > if (!is_sev_es_guest(vcpu)) > return; > > - svm = to_svm(vcpu); > - > - /* > - * If it's an SNP guest, then the VMSA was marked in the RMP table as > - * a guest-owned page. Transition the page to hypervisor state before > - * releasing it back to the system. > - */ > - if (is_sev_snp_guest(vcpu)) { > - u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT; > - > - if (kvm_rmp_make_shared(vcpu->kvm, pfn, PG_LEVEL_4K)) > - goto skip_vmsa_free; > - } > - > - if (vcpu->arch.guest_state_protected) > - sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa); > - > - __free_page(virt_to_page(svm->sev_es.vmsa)); > + sev_es_vcpu_free_vmsa(vcpu); > > -skip_vmsa_free: > __sev_es_unmap_ghcb(svm); > } > > @@ -4067,10 +4229,7 @@ static int snp_begin_psc(struct vcpu_svm *svm) > static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) > { > struct vcpu_svm *svm = to_svm(vcpu); > - struct kvm_memory_slot *slot; > - struct page *page; > - kvm_pfn_t pfn; > - gfn_t gfn; > + u64 vmsa_pa; > > guard(mutex)(&svm->sev_es.snp_vmsa_mutex); > > @@ -4092,46 +4251,17 @@ static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) > */ > vmcb_mark_all_dirty(svm->vmcb); > > - if (!VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) > - return; > - > - gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa); > - svm->sev_es.snp_vmsa_gpa = INVALID_PAGE; > + sev_es_set_guest_vmsa(vcpu, svm->sev_es.req_vmsa_gpa); > + vmsa_pa = sev_es_vmsa_pa(vcpu); > > - slot = gfn_to_memslot(vcpu->kvm, gfn); > - if (!slot) > + if (!VALID_PAGE(vmsa_pa)) > return; > > - /* > - * The new VMSA will be private memory guest memory, so retrieve the > - * PFN from the gmem backend. > - */ > - if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL)) > - return; > - > - /* > - * From this point forward, the VMSA will always be a guest-mapped page > - * rather than the initial one allocated by KVM in svm->sev_es.vmsa. In > - * theory, svm->sev_es.vmsa could be free'd and cleaned up here, but > - * that involves cleanups like flushing caches, which would ideally be > - * handled during teardown rather than guest boot. Deferring that also > - * allows the existing logic for SEV-ES VMSAs to be re-used with > - * minimal SNP-specific changes. > - */ > - svm->sev_es.snp_has_guest_vmsa = true; > - > /* Use the new VMSA */ > - svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn); > + svm->vmcb->control.vmsa_pa = vmsa_pa; > > /* Mark the vCPU as runnable */ > kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE); > - > - /* > - * gmem pages aren't currently migratable, but if this ever changes > - * then care should be taken to ensure svm->sev_es.vmsa is pinned > - * through some other means. > - */ > - kvm_release_page_clean(page); > } > > static int sev_snp_ap_creation(struct vcpu_svm *svm) > @@ -4187,10 +4317,10 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm) > return -EINVAL; > } > > - target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2; > + target_svm->sev_es.req_vmsa_gpa = svm->vmcb->control.exit_info_2; > break; > case SVM_VMGEXIT_AP_DESTROY: > - target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE; > + target_svm->sev_es.req_vmsa_gpa = INVALID_PAGE; > break; > default: > vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n", > @@ -4708,20 +4838,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm, bool init_event) > struct vmcb *vmcb = svm->vmcb01.ptr; > > svm->vmcb->control.misc_ctl |= SVM_MISC_ENABLE_SEV_ES; > - > - /* > - * An SEV-ES guest requires a VMSA area that is a separate from the > - * VMCB page. Do not include the encryption mask on the VMSA physical > - * address since hardware will access it using the guest key. Note, > - * the VMSA will be NULL if this vCPU is the destination for intrahost > - * migration, and will be copied later. > - */ > - if (!svm->sev_es.snp_has_guest_vmsa) { > - if (svm->sev_es.vmsa) > - svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa); > - else > - svm->vmcb->control.vmsa_pa = INVALID_PAGE; > - } > + svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(&svm->vcpu); > > if (cpu_feature_enabled(X86_FEATURE_ALLOWED_SEV_FEATURES)) > svm->vmcb->control.allowed_sev_features = sev->vmsa_features | > @@ -4797,7 +4914,7 @@ void sev_init_vmcb(struct vcpu_svm *svm, bool init_event) > int sev_vcpu_create(struct kvm_vcpu *vcpu) > { > struct vcpu_svm *svm = to_svm(vcpu); > - struct page *vmsa_page; > + int ret; > > mutex_init(&svm->sev_es.snp_vmsa_mutex); > > @@ -4808,11 +4925,9 @@ int sev_vcpu_create(struct kvm_vcpu *vcpu) > * SEV-ES guests require a separate (from the VMCB) VMSA page used to > * contain the encrypted register state of the guest. > */ > - vmsa_page = snp_safe_alloc_page(); > - if (!vmsa_page) > - return -ENOMEM; > - > - svm->sev_es.vmsa = page_address(vmsa_page); > + ret = sev_es_vcpu_alloc_vmsa(vcpu); > + if (ret) > + return ret; > > vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm); > > @@ -5227,12 +5342,14 @@ struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu) > if (!is_sev_es_guest(vcpu)) > return NULL; > > + vmsa = sev_es_vmsa_ref(vcpu); > + > /* > * If the VMSA has not yet been encrypted, return a pointer to the > * current un-encrypted VMSA. > */ > - if (!vcpu->arch.guest_state_protected) > - return (struct vmcb_save_area *)svm->sev_es.vmsa; > + if (vmsa) > + return vmsa; > > sev = to_kvm_sev_info(vcpu->kvm); > > @@ -5303,8 +5420,10 @@ struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu) > > void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa) > { > + struct vmcb_save_area *vmsa_ptr = sev_es_vmsa_ref(vcpu); > + > /* If the VMSA has not yet been encrypted, nothing was allocated */ > - if (!vcpu->arch.guest_state_protected || !vmsa) > + if (vmsa == vmsa_ptr) Can't this just be if (sev_es_vmsa_ref(vcpu)) And then you save defining a new variable. > return; > > free_page((unsigned long)vmsa); > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index 5137416be593..3d4799f09b23 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -240,9 +240,29 @@ struct svm_nested_state { > bool force_msr_bitmap_recalc; > }; > > +enum vmsa_state { > + /* No VMSA set */ > + VMSA_NONE, > + /* VMSA allocated by KVM - Shared in RMP (if applicable) */ > + VMSA_SHARED, VMSA_KVM_SHARED > + /* VMSA allocated by KVM - Guest-private in RMP (SEV-SNP only) */ > + VMSA_PRIVATE, VMSA_KVM_PRIVATE > + /* Guest-owned VMSA */ > + VMSA_GUEST, VMSA_GUEST_OWNED This way ownership of the page becomes more obvious when used in the code. > +}; > + > +struct sev_es_vmsa_state { > + enum vmsa_state vmsa_state; > + union { > + /* state == (KVM_SHARED || KVM_PRIVATE) */ > + struct page *vmsa_page; > + /* state == GUEST */ > + gpa_t vmsa_gpa; > + }; > +}; > + > struct vcpu_sev_es_state { > /* SEV-ES support */ > - struct sev_es_save_area *vmsa; > struct ghcb *ghcb; > u8 valid_bitmap[16]; > struct kvm_host_map ghcb_map; > @@ -266,10 +286,13 @@ struct vcpu_sev_es_state { > > u64 ghcb_registered_gpa; > > - struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */ > - gpa_t snp_vmsa_gpa; > + /* VMSA related state */ > + struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */ > + struct sev_es_vmsa_state vmsa; /* VMSA currently used by the VCPU */ > + gpa_t req_vmsa_gpa; /* Requested new VMSA GPA */ > + > + bool snp_ap_runnable; I don't see this used in the patch anywhere. Thanks, Tom > bool snp_ap_waiting_for_reset; > - bool snp_has_guest_vmsa; > }; > > struct vcpu_svm { ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage 2026-06-11 12:35 [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests Jörg Rödel 2026-06-11 12:35 ` [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs Jörg Rödel @ 2026-06-11 12:35 ` Jörg Rödel 2026-06-16 21:33 ` Tom Lendacky 2026-06-11 12:35 ` [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching Jörg Rödel 2026-06-11 12:35 ` [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE Jörg Rödel 3 siblings, 1 reply; 12+ messages in thread From: Jörg Rödel @ 2026-06-11 12:35 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel From: Joerg Roedel <joerg.roedel@amd.com> Do not allocate a KVM-managed VMSA for all VCPUs on VCPU creation, defer it to the LAUNCH_FINISH stage of SEV-ES and SEV-SNP. At this stage the VMSAs get used for the first time. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> --- arch/x86/kvm/svm/sev.c | 40 +++++++++++++++++++++++----------------- 1 file changed, 23 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 9b1280222e20..350bb97c32c0 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1095,11 +1095,11 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm) { struct kvm_vcpu *vcpu = &svm->vcpu; struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm); - struct sev_es_save_area *save = sev_es_vmsa_ref(vcpu); + struct sev_es_save_area *save; struct xregs_state *xsave; const u8 *s; + int ret, i; u8 *d; - int i; lockdep_assert_held(&vcpu->mutex); @@ -1110,6 +1110,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm) if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1)) return -EINVAL; + ret = sev_es_vcpu_alloc_vmsa(vcpu); + if (ret) + return ret; + + save = sev_es_vmsa_ref(vcpu); + /* * SEV-ES will use a VMSA that is pointed to by the VMCB, not * the traditional VMSA that is part of the VMCB. Copy the @@ -1196,7 +1202,7 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, { struct sev_data_launch_update_vmsa vmsa; struct vcpu_svm *svm = to_svm(vcpu); - void *vmsa_ref = sev_es_vmsa_ref(vcpu); + void *vmsa_ref; int ret; if (vcpu->guest_debug) { @@ -1209,6 +1215,8 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, if (ret) return ret; + vmsa_ref = sev_es_vmsa_ref(vcpu); + /* * The LAUNCH_UPDATE_VMSA command will perform in-place encryption of * the VMSA memory content (i.e it will write the same memory region @@ -1237,6 +1245,9 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, fpstate_set_confidential(&vcpu->arch.guest_fpu); vcpu->arch.guest_state_protected = true; + /* VMSA encrypted - put it into the VMCB */ + svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(vcpu); + /* * SEV-ES guest mandates LBR Virtualization to be _always_ ON. Enable it * only after setting guest_state_protected because KVM_SET_MSRS allows @@ -2689,12 +2700,14 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) kvm_for_each_vcpu(i, vcpu, kvm) { struct vcpu_svm *svm = to_svm(vcpu); - void *vmsa = sev_es_vmsa_ref(vcpu); + void *vmsa; ret = sev_es_sync_vmsa(svm); if (ret) goto out; + vmsa = sev_es_vmsa_ref(vcpu); + ret = sev_es_vcpu_vmsa_make_private(vcpu); if (ret) goto out; @@ -2710,6 +2723,10 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) } svm->vcpu.arch.guest_state_protected = true; + + /* VMSA encrypted - put it into the VMCB */ + svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(vcpu); + /* * SEV-ES (and thus SNP) guest mandates LBR Virtualization to * be _always_ ON. Enable it only after setting @@ -4914,22 +4931,11 @@ void sev_init_vmcb(struct vcpu_svm *svm, bool init_event) int sev_vcpu_create(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); - int ret; mutex_init(&svm->sev_es.snp_vmsa_mutex); - if (!is_sev_es_guest(vcpu)) - return 0; - - /* - * SEV-ES guests require a separate (from the VMCB) VMSA page used to - * contain the encrypted register state of the guest. - */ - ret = sev_es_vcpu_alloc_vmsa(vcpu); - if (ret) - return ret; - - vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm); + if (is_sev_es_guest(vcpu)) + vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm); return 0; } -- 2.53.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage 2026-06-11 12:35 ` [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage Jörg Rödel @ 2026-06-16 21:33 ` Tom Lendacky 0 siblings, 0 replies; 12+ messages in thread From: Tom Lendacky @ 2026-06-16 21:33 UTC (permalink / raw) To: Jörg Rödel, Sean Christopherson, Paolo Bonzini Cc: x86, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel On 6/11/26 07:35, Jörg Rödel wrote: > From: Joerg Roedel <joerg.roedel@amd.com> > > Do not allocate a KVM-managed VMSA for all VCPUs on VCPU creation, > defer it to the LAUNCH_FINISH stage of SEV-ES and SEV-SNP. At this > stage the VMSAs get used for the first time. > > Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> > --- > arch/x86/kvm/svm/sev.c | 40 +++++++++++++++++++++++----------------- > 1 file changed, 23 insertions(+), 17 deletions(-) > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 9b1280222e20..350bb97c32c0 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -1095,11 +1095,11 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm) > { > struct kvm_vcpu *vcpu = &svm->vcpu; > struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm); > - struct sev_es_save_area *save = sev_es_vmsa_ref(vcpu); > + struct sev_es_save_area *save; > struct xregs_state *xsave; > const u8 *s; > + int ret, i; > u8 *d; > - int i; > > lockdep_assert_held(&vcpu->mutex); > > @@ -1110,6 +1110,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm) > if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1)) > return -EINVAL; > > + ret = sev_es_vcpu_alloc_vmsa(vcpu); > + if (ret) > + return ret; > + > + save = sev_es_vmsa_ref(vcpu); > + > /* > * SEV-ES will use a VMSA that is pointed to by the VMCB, not > * the traditional VMSA that is part of the VMCB. Copy the > @@ -1196,7 +1202,7 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, > { > struct sev_data_launch_update_vmsa vmsa; > struct vcpu_svm *svm = to_svm(vcpu); > - void *vmsa_ref = sev_es_vmsa_ref(vcpu); > + void *vmsa_ref; > int ret; > > if (vcpu->guest_debug) { > @@ -1209,6 +1215,8 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, > if (ret) > return ret; > > + vmsa_ref = sev_es_vmsa_ref(vcpu); Maybe just a comment above this that indicates if the sync was successful then a VMSA has been allocated and only now can you get a pointer to it ? > + > /* > * The LAUNCH_UPDATE_VMSA command will perform in-place encryption of > * the VMSA memory content (i.e it will write the same memory region > @@ -1237,6 +1245,9 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, > fpstate_set_confidential(&vcpu->arch.guest_fpu); > vcpu->arch.guest_state_protected = true; > > + /* VMSA encrypted - put it into the VMCB */ > + svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(vcpu); > + > /* > * SEV-ES guest mandates LBR Virtualization to be _always_ ON. Enable it > * only after setting guest_state_protected because KVM_SET_MSRS allows > @@ -2689,12 +2700,14 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) > > kvm_for_each_vcpu(i, vcpu, kvm) { > struct vcpu_svm *svm = to_svm(vcpu); > - void *vmsa = sev_es_vmsa_ref(vcpu); > + void *vmsa; > > ret = sev_es_sync_vmsa(svm); > if (ret) > goto out; > > + vmsa = sev_es_vmsa_ref(vcpu); > + > ret = sev_es_vcpu_vmsa_make_private(vcpu); > if (ret) > goto out; > @@ -2710,6 +2723,10 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) > } > > svm->vcpu.arch.guest_state_protected = true; > + > + /* VMSA encrypted - put it into the VMCB */ > + svm->vmcb->control.vmsa_pa = sev_es_vmsa_pa(vcpu); > + > /* > * SEV-ES (and thus SNP) guest mandates LBR Virtualization to > * be _always_ ON. Enable it only after setting > @@ -4914,22 +4931,11 @@ void sev_init_vmcb(struct vcpu_svm *svm, bool init_event) > int sev_vcpu_create(struct kvm_vcpu *vcpu) > { > struct vcpu_svm *svm = to_svm(vcpu); > - int ret; > > mutex_init(&svm->sev_es.snp_vmsa_mutex); > > - if (!is_sev_es_guest(vcpu)) > - return 0; If you leave this here, it might make it easier to add things in the future should something be needed. Just tack them on at the bottom. Not necessary, though. Thanks, Tom > - > - /* > - * SEV-ES guests require a separate (from the VMCB) VMSA page used to > - * contain the encrypted register state of the guest. > - */ > - ret = sev_es_vcpu_alloc_vmsa(vcpu); > - if (ret) > - return ret; > - > - vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm); > + if (is_sev_es_guest(vcpu)) > + vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm); > > return 0; > } ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching 2026-06-11 12:35 [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests Jörg Rödel 2026-06-11 12:35 ` [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs Jörg Rödel 2026-06-11 12:35 ` [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage Jörg Rödel @ 2026-06-11 12:35 ` Jörg Rödel 2026-06-16 21:48 ` Tom Lendacky 2026-06-11 12:35 ` [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE Jörg Rödel 3 siblings, 1 reply; 12+ messages in thread From: Jörg Rödel @ 2026-06-11 12:35 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel From: Joerg Roedel <joerg.roedel@amd.com> Introduce a way to provide a guest GPA as the initial BSP VMSA and avoid allocating KVM-managed VMSAs in this case. Only one guest-provided VMSA is supported at the moment as IGVM also only supports to set a single VMSA. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> --- arch/x86/kvm/svm/sev.c | 62 ++++++++++++++++++++++++++++++------------ arch/x86/kvm/svm/svm.h | 1 + 2 files changed, 45 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 350bb97c32c0..88db83b3ff8e 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -726,6 +726,7 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp, INIT_LIST_HEAD(&sev->regions_list); INIT_LIST_HEAD(&sev->mirror_vms); + sev->initial_vmsa_gpa = INVALID_PAGE; sev->need_init = false; kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV); @@ -2680,6 +2681,46 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp) return 0; } +static int snp_init_guest_vmsa(struct kvm_vcpu *vcpu, gpa_t vmsa_gpa) +{ + /* Only one initial guest VMSA can exist (per IGVM) - so it belongs to the BSP */ + if (vcpu->vcpu_idx != 0) + return 0; + + /* VMSA already private and encrypted via LAUNCH_UPDATE */ + sev_es_set_guest_vmsa(vcpu, vmsa_gpa); + + return 0; +} + +static int snp_init_kvm_vmsa(struct kvm_vcpu *vcpu, + struct sev_data_snp_launch_update *data, + struct kvm_sev_cmd *argp) +{ + struct vcpu_svm *svm = to_svm(vcpu); + int ret; + void *vmsa; + + ret = sev_es_sync_vmsa(svm); + if (ret) + return ret; + + vmsa = sev_es_vmsa_ref(vcpu); + + ret = sev_es_vcpu_vmsa_make_private(vcpu); + if (ret) + return ret; + + /* Issue the SNP command to encrypt the VMSA */ + data->address = __sme_pa(vmsa); + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, + data, &argp->error); + if (ret) + sev_snp_vcpu_reclaim_vmsa(vcpu); + + return ret; +} + static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) { struct kvm_sev_info *sev = to_kvm_sev_info(kvm); @@ -2700,28 +2741,13 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) kvm_for_each_vcpu(i, vcpu, kvm) { struct vcpu_svm *svm = to_svm(vcpu); - void *vmsa; - ret = sev_es_sync_vmsa(svm); + ret = VALID_PAGE(sev->initial_vmsa_gpa) ? + snp_init_guest_vmsa(vcpu, sev->initial_vmsa_gpa) : + snp_init_kvm_vmsa(vcpu, &data, argp); if (ret) goto out; - vmsa = sev_es_vmsa_ref(vcpu); - - ret = sev_es_vcpu_vmsa_make_private(vcpu); - if (ret) - goto out; - - /* Issue the SNP command to encrypt the VMSA */ - data.address = __sme_pa(vmsa); - ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, - &data, &argp->error); - if (ret) { - sev_snp_vcpu_reclaim_vmsa(vcpu); - - goto out; - } - svm->vcpu.arch.guest_state_protected = true; /* VMSA encrypted - put it into the VMCB */ diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 3d4799f09b23..cc7e84c230bb 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -117,6 +117,7 @@ struct kvm_sev_info { struct mutex guest_req_mutex; /* Must acquire before using bounce buffers */ cpumask_var_t have_run_cpus; /* CPUs that have done VMRUN for this VM. */ bool snp_certs_enabled; /* SNP certificate-fetching support. */ + gpa_t initial_vmsa_gpa; /* Optinal GPA of BSP VMSA - SEV-SNP only */ }; #endif -- 2.53.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching 2026-06-11 12:35 ` [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching Jörg Rödel @ 2026-06-16 21:48 ` Tom Lendacky 0 siblings, 0 replies; 12+ messages in thread From: Tom Lendacky @ 2026-06-16 21:48 UTC (permalink / raw) To: Jörg Rödel, Sean Christopherson, Paolo Bonzini Cc: x86, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel On 6/11/26 07:35, Jörg Rödel wrote: > From: Joerg Roedel <joerg.roedel@amd.com> > > Introduce a way to provide a guest GPA as the initial BSP VMSA and > avoid allocating KVM-managed VMSAs in this case. Only one > guest-provided VMSA is supported at the moment as IGVM also only > supports to set a single VMSA. > > Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> > --- > arch/x86/kvm/svm/sev.c | 62 ++++++++++++++++++++++++++++++------------ > arch/x86/kvm/svm/svm.h | 1 + > 2 files changed, 45 insertions(+), 18 deletions(-) > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 350bb97c32c0..88db83b3ff8e 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -726,6 +726,7 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp, > > INIT_LIST_HEAD(&sev->regions_list); > INIT_LIST_HEAD(&sev->mirror_vms); > + sev->initial_vmsa_gpa = INVALID_PAGE; > sev->need_init = false; > > kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV); > @@ -2680,6 +2681,46 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp) > return 0; > } > > +static int snp_init_guest_vmsa(struct kvm_vcpu *vcpu, gpa_t vmsa_gpa) > +{ > + /* Only one initial guest VMSA can exist (per IGVM) - so it belongs to the BSP */ Maybe expand this comment to indicate that none of the other vCPU VMSAs are created by KVM, that the guest is responsible for creating them for the first time. Which reminds me that you will need to provide the GHCB APIC ID List NAE event support. If OVMF was ever to be built as an IGVM file, then without that GHCB event support it will perform a broadcast INIT-SIPI for the first AP startup, which will fail because no VMSAs will have been created. If OVMF sees that the HV has advertised the event, then it will create all the VMSAs itself and use the GHCB AP Create NAE event for initial startup of the APs. > + if (vcpu->vcpu_idx != 0) > + return 0; > + > + /* VMSA already private and encrypted via LAUNCH_UPDATE */ > + sev_es_set_guest_vmsa(vcpu, vmsa_gpa); > + > + return 0; > +} > + > +static int snp_init_kvm_vmsa(struct kvm_vcpu *vcpu, > + struct sev_data_snp_launch_update *data, > + struct kvm_sev_cmd *argp) > +{ > + struct vcpu_svm *svm = to_svm(vcpu); > + int ret; > + void *vmsa; > + > + ret = sev_es_sync_vmsa(svm); > + if (ret) > + return ret; > + > + vmsa = sev_es_vmsa_ref(vcpu); > + > + ret = sev_es_vcpu_vmsa_make_private(vcpu); > + if (ret) > + return ret; > + > + /* Issue the SNP command to encrypt the VMSA */ > + data->address = __sme_pa(vmsa); > + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, > + data, &argp->error); > + if (ret) > + sev_snp_vcpu_reclaim_vmsa(vcpu); > + > + return ret; > +} > + > static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) > { > struct kvm_sev_info *sev = to_kvm_sev_info(kvm); > @@ -2700,28 +2741,13 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp) > > kvm_for_each_vcpu(i, vcpu, kvm) { > struct vcpu_svm *svm = to_svm(vcpu); > - void *vmsa; > > - ret = sev_es_sync_vmsa(svm); > + ret = VALID_PAGE(sev->initial_vmsa_gpa) ? > + snp_init_guest_vmsa(vcpu, sev->initial_vmsa_gpa) : > + snp_init_kvm_vmsa(vcpu, &data, argp); > if (ret) > goto out; > > - vmsa = sev_es_vmsa_ref(vcpu); > - > - ret = sev_es_vcpu_vmsa_make_private(vcpu); > - if (ret) > - goto out; > - > - /* Issue the SNP command to encrypt the VMSA */ > - data.address = __sme_pa(vmsa); > - ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, > - &data, &argp->error); > - if (ret) { > - sev_snp_vcpu_reclaim_vmsa(vcpu); > - > - goto out; > - } > - > svm->vcpu.arch.guest_state_protected = true; > > /* VMSA encrypted - put it into the VMCB */ > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index 3d4799f09b23..cc7e84c230bb 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -117,6 +117,7 @@ struct kvm_sev_info { > struct mutex guest_req_mutex; /* Must acquire before using bounce buffers */ > cpumask_var_t have_run_cpus; /* CPUs that have done VMRUN for this VM. */ > bool snp_certs_enabled; /* SNP certificate-fetching support. */ > + gpa_t initial_vmsa_gpa; /* Optinal GPA of BSP VMSA - SEV-SNP only */ s/Optinal/Optional/ Should it be called bsp_vmsa_gpa ? Thanks, Tom > }; > #endif > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE 2026-06-11 12:35 [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests Jörg Rödel ` (2 preceding siblings ...) 2026-06-11 12:35 ` [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching Jörg Rödel @ 2026-06-11 12:35 ` Jörg Rödel 2026-06-11 12:43 ` Sean Christopherson 2026-06-16 22:11 ` Tom Lendacky 3 siblings, 2 replies; 12+ messages in thread From: Jörg Rödel @ 2026-06-11 12:35 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel From: Joerg Roedel <joerg.roedel@amd.com> Support setting a VMSA in guest physical memory during the SEV-SNP launch process. Only one VMSA can be provided which will then be used for the BSP. All of the APs will not have a VMSA allocated or assigned when this feature is used. This ensures stable launch measurements on SEV-SNP which are independent of the number of VCPUs the VM is launched with. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> --- arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/svm/sev.c | 44 ++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 1 + include/uapi/linux/kvm.h | 1 + 4 files changed, 46 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 5f2b30d0405c..fc87a5ba295b 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -885,6 +885,7 @@ struct kvm_sev_snp_launch_start { /* Kept in sync with firmware values for simplicity. */ #define KVM_SEV_PAGE_TYPE_INVALID 0x0 #define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1 +#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2 #define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3 #define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4 #define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5 diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 88db83b3ff8e..90399d5d0358 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2520,6 +2520,20 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) return rc; } +static bool snp_check_launch_vmsa(struct kvm_sev_info *sev, + struct sev_es_save_area *vmsa) +{ + /* VMSA sev_features must match VMs vmsa_features */ + if (vmsa->sev_features != sev->vmsa_features) + return false; + + /* Must always boot from VMPL0 */ + if (vmsa->vmpl != 0) + return false; + + return true; +} + struct sev_gmem_populate_args { __u8 type; int sev_fd; @@ -2532,7 +2546,9 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, struct sev_gmem_populate_args *sev_populate_args = opaque; struct sev_data_snp_launch_update fw_args = {0}; struct kvm_sev_info *sev = to_kvm_sev_info(kvm); + gpa_t gpa = gfn << PAGE_SHIFT; bool assigned = false; + u64 sev_features = 0; int level; int ret; @@ -2550,14 +2566,27 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, if (src_page) { void *src_vaddr = kmap_local_page(src_page); void *dst_vaddr = kmap_local_pfn(pfn); + struct sev_es_save_area *vmsa = dst_vaddr; + bool accept_page = true; memcpy(dst_vaddr, src_vaddr, PAGE_SIZE); + if (sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_VMSA) { + accept_page = snp_check_launch_vmsa(sev, vmsa); + if (accept_page) + sev_features = vmsa->sev_features; + } + kunmap_local(src_vaddr); kunmap_local(dst_vaddr); + + if (!accept_page) { + ret = -EINVAL; + goto out; + } } - ret = rmp_make_private(pfn, gfn << PAGE_SHIFT, PG_LEVEL_4K, + ret = rmp_make_private(pfn, gpa, PG_LEVEL_4K, sev_get_asid(kvm), true); if (ret) goto out; @@ -2593,6 +2622,9 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, kunmap_local(dst_vaddr); } + if (ret == 0 && sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_VMSA) + sev->initial_vmsa_gpa = gpa; + out: if (ret) pr_debug("%s: error updating GFN %llx, return code %d (fw_error %d)\n", @@ -2620,12 +2652,22 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp) if (!params.len || !PAGE_ALIGNED(params.len) || params.flags || (params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL && + params.type != KVM_SEV_SNP_PAGE_TYPE_VMSA && params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO && params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED && params.type != KVM_SEV_SNP_PAGE_TYPE_SECRETS && params.type != KVM_SEV_SNP_PAGE_TYPE_CPUID)) return -EINVAL; + if (params.type == KVM_SEV_SNP_PAGE_TYPE_VMSA) { + /* VMSA page are allowed only once */ + if (sev->initial_vmsa_gpa != INVALID_PAGE) + return -EBUSY; + /* Can only deploy a single page as VMSA */ + if (params.len != PAGE_SIZE) + return -EINVAL; + } + src = params.type == KVM_SEV_SNP_PAGE_TYPE_ZERO ? NULL : u64_to_user_ptr(params.uaddr); if (!PAGE_ALIGNED(src)) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0550359ed798..dc9abe62476e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4870,6 +4870,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_MEMORY_FAULT_INFO: case KVM_CAP_X86_GUEST_MODE: case KVM_CAP_ONE_REG: + case KVM_CAP_SNP_DIRECT_VMSA: r = 1; break; case KVM_CAP_PRE_FAULT_MEMORY: diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6c8afa2047bf..bf034435f98c 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -996,6 +996,7 @@ struct kvm_enable_cap { #define KVM_CAP_S390_USER_OPEREXEC 246 #define KVM_CAP_S390_KEYOP 247 #define KVM_CAP_S390_VSIE_ESAMODE 248 +#define KVM_CAP_SNP_DIRECT_VMSA 249 struct kvm_irq_routing_irqchip { __u32 irqchip; -- 2.53.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE 2026-06-11 12:35 ` [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE Jörg Rödel @ 2026-06-11 12:43 ` Sean Christopherson 2026-06-11 13:23 ` Jörg Rödel 2026-06-16 22:11 ` Tom Lendacky 1 sibling, 1 reply; 12+ messages in thread From: Sean Christopherson @ 2026-06-11 12:43 UTC (permalink / raw) To: Jörg Rödel Cc: Paolo Bonzini, x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel On Thu, Jun 11, 2026, Jörg Rödel wrote: > From: Joerg Roedel <joerg.roedel@amd.com> > > Support setting a VMSA in guest physical memory during the SEV-SNP > launch process. Only one VMSA can be provided which will then be used > for the BSP. All of the APs will not have a VMSA allocated or assigned > when this feature is used. > > This ensures stable launch measurements on SEV-SNP which are > independent of the number of VCPUs the VM is launched with. This needs a *much* longer explanation and more justification for exactly why this needs to be handled in KVM. I understand most of the words and acronyms, but that's about where my understanding stops. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE 2026-06-11 12:43 ` Sean Christopherson @ 2026-06-11 13:23 ` Jörg Rödel 2026-06-16 17:55 ` Sean Christopherson 0 siblings, 1 reply; 12+ messages in thread From: Jörg Rödel @ 2026-06-11 13:23 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel Hi Sean, On Thu, Jun 11, 2026 at 05:43:05AM -0700, Sean Christopherson wrote: > On Thu, Jun 11, 2026, Jörg Rödel wrote: > > From: Joerg Roedel <joerg.roedel@amd.com> > > > > Support setting a VMSA in guest physical memory during the SEV-SNP > > launch process. Only one VMSA can be provided which will then be used > > for the BSP. All of the APs will not have a VMSA allocated or assigned > > when this feature is used. > > > > This ensures stable launch measurements on SEV-SNP which are > > independent of the number of VCPUs the VM is launched with. > > This needs a *much* longer explanation and more justification for exactly why > this needs to be handled in KVM. I understand most of the words and acronyms, > but that's about where my understanding stops. Sure, how about: For SEV-SNP VMs KVM currently allocates and measures one VMSA per VCPU into the initial memory image. Historically this behavior comes from the SEV-ES implementation, which has no concept of a guest-provided or guest-owned VMSA. So on SEV-ES there is no other choice than allocating the VMSAs in KVM. In contrast, on SEV-SNP each VMSA has a GPA assigned and is (in theory) guest-owned, so that the old SEV-ES behavior of letting KVM manage the VMSAs causes several problems (especially together with IGVM-loading) and inefficiencies: 1. With the current KVM behavior the initial launch measurement depends on the number of VCPUs the VM has assigned. 2. Current SEV-SNP guest code will not use the KVM-allocated VMSAs for APs. Both EDK2 and the Linux kernel will allocate and provide their own VMSA pages for every AP. So the current allocation dance KVM is doing is useless for the APs. 3. The current behavior makes it impossible to implement the IGVM-promise of a predictable launch measurement derived from only the IGVM file and the target platform. To solve these problems this patch adds support to measure an IGVM-provided VMSA page into the initial SEV-SNP memory image. Only one VMSA page is supported for now, which aligns with the IGVM requirement that each file can only provide one VP-context. The VMSA will be checked by KVM for supported SEV features and VMPL0 before being accepted. When a VMSA page is measured in this way it will be used as the launch VMSA of the BSP for the VM. For all other VCPUs KVM will not allocate or measure VMSA pages, keeping the launch measurement in sync with the IGVM image. The guest has to provide VMSAs for all APs it intends to use, which common guest components already do anyway. When the feature is not used the current behavior is preserved. The changes have been tested together with the KVM planes patches and COCONUT-SVSM and showed that using this feature leads to a launch measurement matching the IGVM-prediction. ? -Joerg ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE 2026-06-11 13:23 ` Jörg Rödel @ 2026-06-16 17:55 ` Sean Christopherson 0 siblings, 0 replies; 12+ messages in thread From: Sean Christopherson @ 2026-06-16 17:55 UTC (permalink / raw) To: Jörg Rödel Cc: Paolo Bonzini, x86, Tom Lendacky, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel, Jethro Beekman +Jethro On Thu, Jun 11, 2026, Jörg Rödel wrote: > Hi Sean, > > On Thu, Jun 11, 2026 at 05:43:05AM -0700, Sean Christopherson wrote: > > On Thu, Jun 11, 2026, Jörg Rödel wrote: > > > From: Joerg Roedel <joerg.roedel@amd.com> > > > > > > Support setting a VMSA in guest physical memory during the SEV-SNP > > > launch process. Only one VMSA can be provided which will then be used > > > for the BSP. All of the APs will not have a VMSA allocated or assigned > > > when this feature is used. > > > > > > This ensures stable launch measurements on SEV-SNP which are > > > independent of the number of VCPUs the VM is launched with. > > > > This needs a *much* longer explanation and more justification for exactly why > > this needs to be handled in KVM. I understand most of the words and acronyms, > > but that's about where my understanding stops. > > Sure, how about: > > For SEV-SNP VMs KVM currently allocates and measures one VMSA per VCPU into the > initial memory image. Historically this behavior comes from the SEV-ES > implementation, which has no concept of a guest-provided or guest-owned VMSA. > So on SEV-ES there is no other choice than allocating the VMSAs in KVM. > > In contrast, on SEV-SNP each VMSA has a GPA assigned and is (in theory) > guest-owned, so that the old SEV-ES behavior of letting KVM manage the > VMSAs causes several problems (especially together with IGVM-loading) > and inefficiencies: > > 1. With the current KVM behavior the initial launch measurement depends > on the number of VCPUs the VM has assigned. > > 2. Current SEV-SNP guest code will not use the KVM-allocated VMSAs for > APs. Both EDK2 and the Linux kernel will allocate and provide their > own VMSA pages for every AP. So the current allocation dance KVM is > doing is useless for the APs. > > 3. The current behavior makes it impossible to implement the > IGVM-promise of a predictable launch measurement derived from only > the IGVM file and the target platform. > > To solve these problems this patch adds support to measure an IGVM-provided > VMSA page into the initial SEV-SNP memory image. Only one VMSA page is > supported for now, which aligns with the IGVM requirement that each file can > only provide one VP-context. The VMSA will be checked by KVM for supported SEV > features and VMPL0 before being accepted. > > When a VMSA page is measured in this way it will be used as the launch VMSA of > the BSP for the VM. For all other VCPUs KVM will not allocate or measure VMSA > pages, keeping the launch measurement in sync with the IGVM image. The guest > has to provide VMSAs for all APs it intends to use, which common guest > components already do anyway. Isn't this essentially the same thing as hot-plugging vCPUs after launch? I have yet to review it in depth (sorry Jethro), but it looks a *lot* simpler. https://lore.kernel.org/all/20d3a189-5649-4864-81cd-5a421267f21b@fortanix.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE 2026-06-11 12:35 ` [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE Jörg Rödel 2026-06-11 12:43 ` Sean Christopherson @ 2026-06-16 22:11 ` Tom Lendacky 1 sibling, 0 replies; 12+ messages in thread From: Tom Lendacky @ 2026-06-16 22:11 UTC (permalink / raw) To: Jörg Rödel, Sean Christopherson, Paolo Bonzini Cc: x86, Michael Roth, kvm, linux-kernel, coconut-svsm, Joerg Roedel On 6/11/26 07:35, Jörg Rödel wrote: > From: Joerg Roedel <joerg.roedel@amd.com> > > Support setting a VMSA in guest physical memory during the SEV-SNP > launch process. Only one VMSA can be provided which will then be used > for the BSP. All of the APs will not have a VMSA allocated or assigned > when this feature is used. > > This ensures stable launch measurements on SEV-SNP which are > independent of the number of VCPUs the VM is launched with. > > Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> > --- > arch/x86/include/uapi/asm/kvm.h | 1 + > arch/x86/kvm/svm/sev.c | 44 ++++++++++++++++++++++++++++++++- > arch/x86/kvm/x86.c | 1 + > include/uapi/linux/kvm.h | 1 + > 4 files changed, 46 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h > index 5f2b30d0405c..fc87a5ba295b 100644 > --- a/arch/x86/include/uapi/asm/kvm.h > +++ b/arch/x86/include/uapi/asm/kvm.h > @@ -885,6 +885,7 @@ struct kvm_sev_snp_launch_start { > /* Kept in sync with firmware values for simplicity. */ > #define KVM_SEV_PAGE_TYPE_INVALID 0x0 > #define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1 > +#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2 > #define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3 > #define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4 > #define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5 > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 88db83b3ff8e..90399d5d0358 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -2520,6 +2520,20 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) > return rc; > } > > +static bool snp_check_launch_vmsa(struct kvm_sev_info *sev, > + struct sev_es_save_area *vmsa) > +{ > + /* VMSA sev_features must match VMs vmsa_features */ > + if (vmsa->sev_features != sev->vmsa_features) > + return false; > + > + /* Must always boot from VMPL0 */ > + if (vmsa->vmpl != 0) > + return false; > + > + return true; > +} > + > struct sev_gmem_populate_args { > __u8 type; > int sev_fd; > @@ -2532,7 +2546,9 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, > struct sev_gmem_populate_args *sev_populate_args = opaque; > struct sev_data_snp_launch_update fw_args = {0}; > struct kvm_sev_info *sev = to_kvm_sev_info(kvm); > + gpa_t gpa = gfn << PAGE_SHIFT; > bool assigned = false; > + u64 sev_features = 0; > int level; > int ret; > > @@ -2550,14 +2566,27 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, > if (src_page) { > void *src_vaddr = kmap_local_page(src_page); > void *dst_vaddr = kmap_local_pfn(pfn); > + struct sev_es_save_area *vmsa = dst_vaddr; > + bool accept_page = true; > > memcpy(dst_vaddr, src_vaddr, PAGE_SIZE); > > + if (sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_VMSA) { > + accept_page = snp_check_launch_vmsa(sev, vmsa); > + if (accept_page) > + sev_features = vmsa->sev_features; > + } I don't think there is a race here given the way guest_memfd works today. I haven't followed in-place conversion closely, but will that result in a race between when the snp_check_launch_vmsa() check is performed and before the page is made a firmware page? > + > kunmap_local(src_vaddr); > kunmap_local(dst_vaddr); > + > + if (!accept_page) { > + ret = -EINVAL; > + goto out; > + } > } > > - ret = rmp_make_private(pfn, gfn << PAGE_SHIFT, PG_LEVEL_4K, > + ret = rmp_make_private(pfn, gpa, PG_LEVEL_4K, > sev_get_asid(kvm), true); > if (ret) > goto out; > @@ -2593,6 +2622,9 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, > kunmap_local(dst_vaddr); > } > > + if (ret == 0 && sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_VMSA) s/ret == 0/!ret/ Thanks, Tom > + sev->initial_vmsa_gpa = gpa; > + > out: > if (ret) > pr_debug("%s: error updating GFN %llx, return code %d (fw_error %d)\n", > @@ -2620,12 +2652,22 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp) > > if (!params.len || !PAGE_ALIGNED(params.len) || params.flags || > (params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL && > + params.type != KVM_SEV_SNP_PAGE_TYPE_VMSA && > params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO && > params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED && > params.type != KVM_SEV_SNP_PAGE_TYPE_SECRETS && > params.type != KVM_SEV_SNP_PAGE_TYPE_CPUID)) > return -EINVAL; > > + if (params.type == KVM_SEV_SNP_PAGE_TYPE_VMSA) { > + /* VMSA page are allowed only once */ > + if (sev->initial_vmsa_gpa != INVALID_PAGE) > + return -EBUSY; > + /* Can only deploy a single page as VMSA */ > + if (params.len != PAGE_SIZE) > + return -EINVAL; > + } > + > src = params.type == KVM_SEV_SNP_PAGE_TYPE_ZERO ? NULL : u64_to_user_ptr(params.uaddr); > > if (!PAGE_ALIGNED(src)) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 0550359ed798..dc9abe62476e 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4870,6 +4870,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_MEMORY_FAULT_INFO: > case KVM_CAP_X86_GUEST_MODE: > case KVM_CAP_ONE_REG: > + case KVM_CAP_SNP_DIRECT_VMSA: > r = 1; > break; > case KVM_CAP_PRE_FAULT_MEMORY: > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 6c8afa2047bf..bf034435f98c 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -996,6 +996,7 @@ struct kvm_enable_cap { > #define KVM_CAP_S390_USER_OPEREXEC 246 > #define KVM_CAP_S390_KEYOP 247 > #define KVM_CAP_S390_VSIE_ESAMODE 248 > +#define KVM_CAP_SNP_DIRECT_VMSA 249 > > struct kvm_irq_routing_irqchip { > __u32 irqchip; ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-06-16 22:12 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-11 12:35 [PATCH 0/4] KVM: SEV: Support direct setting of VMSA for SEV-SNP guests Jörg Rödel 2026-06-11 12:35 ` [PATCH 1/4] kvm: svm: Streamline VMSA setting for VCPUs Jörg Rödel 2026-06-16 20:52 ` Tom Lendacky 2026-06-11 12:35 ` [PATCH 2/4] kvm: svm: Defer VMSA allocation to LAUNCH_FINISH stage Jörg Rödel 2026-06-16 21:33 ` Tom Lendacky 2026-06-11 12:35 ` [PATCH 3/4] kvm: svm: Support guest-provided VMSA for launching Jörg Rödel 2026-06-16 21:48 ` Tom Lendacky 2026-06-11 12:35 ` [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE Jörg Rödel 2026-06-11 12:43 ` Sean Christopherson 2026-06-11 13:23 ` Jörg Rödel 2026-06-16 17:55 ` Sean Christopherson 2026-06-16 22:11 ` Tom Lendacky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox