From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Fred Griffoul <griffoul@gmail.com>, kvm@vger.kernel.org
Cc: seanjc@google.com, pbonzini@redhat.com, shuah@kernel.org,
dwmw@amazon.co.uk, linux-kselftest@vger.kernel.org,
linux-kernel@vger.kernel.org,
Fred Griffoul <fgriffo@amazon.co.uk>
Subject: Re: [PATCH v4 06/10] KVM: nVMX: Cache evmcs fields to ensure consistency during VM-entry
Date: Fri, 02 Jan 2026 16:40:03 +0100 [thread overview]
Message-ID: <87a4ywaur0.fsf@redhat.com> (raw)
In-Reply-To: <20260102142429.896101-7-griffoul@gmail.com>
Fred Griffoul <griffoul@gmail.com> writes:
> From: Fred Griffoul <fgriffo@amazon.co.uk>
>
> Cache enlightened VMCS control fields to prevent TOCTOU races where the
> guest could modify hv_clean_fields or hv_enlightenments_control between
> multiple accesses during nested VM-entry.
>
> The cached values ensure consistent behavior across:
> - The evmcs-to-vmcs12 copy operations
> - MSR bitmap validation
> - Clean field checks in prepare_vmcs02_rare()
>
> This eliminates potential guest-induced inconsistencies in nested
> virtualization state management.
Could you please split the commit to simplify the review and bughunting
if (when?) issues are discovered? In particular, I would've split this
into at least:
- hv_flush_hypercall caching/handling
- hv_msr_bitmap caching/handling
- nested_vmx_is_evmptr12_valid() instead of vmx->nested.hv_evmcs checks
- adding 'bool copy' argument to nested_vmx_handle_enlightened_vmptrld()
and moving copy_enlightened_to_vmcs12() there from nested_vmx_run()
- hv_clean_fields caching/handling.
I want to be extra careful with clean fields. I may had forgotten the
details since I've implemented it but I still remember some pain over
inconsistent state upon migrations, namely
commit d6bf71a18c74de61548ddad44ff95306fe85f829
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Wed May 26 15:20:22 2021 +0200
KVM: nVMX: Ignore 'hv_clean_fields' data when eVMCS data is copied in vmx_get_nested_state()
Unfortunatelly, I don't think we have good test coverage for all this in
'hyperv_evmcs' selftest :-( I've seen you added evmcs support to
'vmx_l2_switch_test' in this series, thanks a lot for doing that!
>
> Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk>
> ---
> arch/x86/kvm/vmx/hyperv.c | 5 ++--
> arch/x86/kvm/vmx/hyperv.h | 20 +++++++++++++
> arch/x86/kvm/vmx/nested.c | 62 ++++++++++++++++++++++++---------------
> arch/x86/kvm/vmx/vmx.h | 5 +++-
> 4 files changed, 65 insertions(+), 27 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/hyperv.c b/arch/x86/kvm/vmx/hyperv.c
> index fa41d036acd4..961b91b9bd64 100644
> --- a/arch/x86/kvm/vmx/hyperv.c
> +++ b/arch/x86/kvm/vmx/hyperv.c
> @@ -213,12 +213,11 @@ bool nested_evmcs_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu)
> {
> struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> - struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs;
>
> - if (!hv_vcpu || !evmcs)
> + if (!hv_vcpu || !nested_vmx_is_evmptr12_valid(vmx))
> return false;
>
> - if (!evmcs->hv_enlightenments_control.nested_flush_hypercall)
> + if (!vmx->nested.hv_flush_hypercall)
> return false;
>
> return hv_vcpu->vp_assist_page.nested_control.features.directhypercall;
> diff --git a/arch/x86/kvm/vmx/hyperv.h b/arch/x86/kvm/vmx/hyperv.h
> index 11a339009781..3c7fea501ca5 100644
> --- a/arch/x86/kvm/vmx/hyperv.h
> +++ b/arch/x86/kvm/vmx/hyperv.h
> @@ -52,6 +52,16 @@ static inline bool guest_cpu_cap_has_evmcs(struct kvm_vcpu *vcpu)
> to_vmx(vcpu)->nested.enlightened_vmcs_enabled;
> }
>
> +static inline u32 nested_evmcs_clean_fields(struct vcpu_vmx *vmx)
> +{
> + return vmx->nested.hv_clean_fields;
> +}
> +
> +static inline bool nested_evmcs_msr_bitmap(struct vcpu_vmx *vmx)
> +{
> + return vmx->nested.hv_msr_bitmap;
> +}
> +
> u64 nested_get_evmptr(struct kvm_vcpu *vcpu);
> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
> int nested_enable_evmcs(struct kvm_vcpu *vcpu,
> @@ -85,6 +95,16 @@ static inline struct hv_enlightened_vmcs *nested_vmx_evmcs(struct vcpu_vmx *vmx)
> {
> return NULL;
> }
> +
> +static inline u32 nested_evmcs_clean_fields(struct vcpu_vmx *vmx)
> +{
> + return 0;
> +}
> +
> +static inline bool nested_evmcs_msr_bitmap(struct vcpu_vmx *vmx)
> +{
> + return false;
> +}
> #endif
>
> #endif /* __KVM_X86_VMX_HYPERV_H */
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index cb4b85edcb7a..5790e1a26456 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -236,6 +236,9 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
> kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map);
> vmx->nested.hv_evmcs = NULL;
> vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID;
> + vmx->nested.hv_clean_fields = 0;
> + vmx->nested.hv_msr_bitmap = false;
> + vmx->nested.hv_flush_hypercall = false;
>
> if (hv_vcpu) {
> hv_vcpu->nested.pa_page_gpa = INVALID_GPA;
> @@ -737,10 +740,10 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
> * and tells KVM (L0) there were no changes in MSR bitmap for L2.
> */
> if (!vmx->nested.force_msr_bitmap_recalc) {
> - struct hv_enlightened_vmcs *evmcs = nested_vmx_evmcs(vmx);
> -
> - if (evmcs && evmcs->hv_enlightenments_control.msr_bitmap &&
> - evmcs->hv_clean_fields & HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP)
> + if (nested_vmx_is_evmptr12_valid(vmx) &&
> + nested_evmcs_msr_bitmap(vmx) &&
> + (nested_evmcs_clean_fields(vmx)
> + & HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP))
> return true;
> }
>
> @@ -2214,10 +2217,11 @@ static void copy_vmcs12_to_enlightened(struct vcpu_vmx *vmx)
> * instruction.
> */
> static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld(
> - struct kvm_vcpu *vcpu, bool from_launch)
> + struct kvm_vcpu *vcpu, bool from_launch, bool copy)
> {
> #ifdef CONFIG_KVM_HYPERV
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> + struct hv_enlightened_vmcs *evmcs;
> bool evmcs_gpa_changed = false;
> u64 evmcs_gpa;
>
> @@ -2297,6 +2301,22 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld(
> vmx->nested.force_msr_bitmap_recalc = true;
> }
>
> + /* Cache evmcs fields to avoid reading evmcs after copy to vmcs12 */
> + evmcs = vmx->nested.hv_evmcs;
> + vmx->nested.hv_clean_fields = evmcs->hv_clean_fields;
> + vmx->nested.hv_flush_hypercall = evmcs->hv_enlightenments_control.nested_flush_hypercall;
> + vmx->nested.hv_msr_bitmap = evmcs->hv_enlightenments_control.msr_bitmap;
> +
> + if (copy) {
> + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> +
> + if (likely(!vmcs12->hdr.shadow_vmcs)) {
> + copy_enlightened_to_vmcs12(vmx, vmx->nested.hv_clean_fields);
> + /* Enlightened VMCS doesn't have launch state */
> + vmcs12->launch_state = !from_launch;
> + }
> + }
> +
> return EVMPTRLD_SUCCEEDED;
> #else
> return EVMPTRLD_DISABLED;
> @@ -2655,10 +2675,12 @@ static void vmcs_write_cet_state(struct kvm_vcpu *vcpu, u64 s_cet,
>
> static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> {
> - struct hv_enlightened_vmcs *hv_evmcs = nested_vmx_evmcs(vmx);
> + u32 hv_clean_fields = 0;
>
> - if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
> - HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) {
> + if (nested_vmx_is_evmptr12_valid(vmx))
> + hv_clean_fields = nested_evmcs_clean_fields(vmx);
> +
> + if (!(hv_clean_fields & HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) {
>
> vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
> vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
> @@ -2700,8 +2722,7 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> vmx_segment_cache_clear(vmx);
> }
>
> - if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
> - HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1)) {
> + if (!(hv_clean_fields & HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1)) {
> vmcs_write32(GUEST_SYSENTER_CS, vmcs12->guest_sysenter_cs);
> vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> vmcs12->guest_pending_dbg_exceptions);
> @@ -2792,7 +2813,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> enum vm_entry_failure_code *entry_failure_code)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> - struct hv_enlightened_vmcs *evmcs = nested_vmx_evmcs(vmx);
> + struct hv_enlightened_vmcs *evmcs;
> bool load_guest_pdptrs_vmcs12 = false;
>
> if (vmx->nested.dirty_vmcs12 || nested_vmx_is_evmptr12_valid(vmx)) {
> @@ -2800,7 +2821,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> vmx->nested.dirty_vmcs12 = false;
>
> load_guest_pdptrs_vmcs12 = !nested_vmx_is_evmptr12_valid(vmx) ||
> - !(evmcs->hv_clean_fields & HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1);
> + !(nested_evmcs_clean_fields(vmx)
> + & HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1);
> }
>
> if (vmx->nested.nested_run_pending &&
> @@ -2929,7 +2951,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> * bits when it changes a field in eVMCS. Mark all fields as clean
> * here.
> */
> - if (nested_vmx_is_evmptr12_valid(vmx))
> + evmcs = nested_vmx_evmcs(vmx);
> + if (evmcs)
> evmcs->hv_clean_fields |= HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
>
> return 0;
> @@ -3477,7 +3500,7 @@ static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu)
> if (guest_cpu_cap_has_evmcs(vcpu) &&
> vmx->nested.hv_evmcs_vmptr == EVMPTR_MAP_PENDING) {
> enum nested_evmptrld_status evmptrld_status =
> - nested_vmx_handle_enlightened_vmptrld(vcpu, false);
> + nested_vmx_handle_enlightened_vmptrld(vcpu, false, false);
>
> if (evmptrld_status == EVMPTRLD_VMFAIL ||
> evmptrld_status == EVMPTRLD_ERROR)
> @@ -3867,7 +3890,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
> if (!nested_vmx_check_permission(vcpu))
> return 1;
>
> - evmptrld_status = nested_vmx_handle_enlightened_vmptrld(vcpu, launch);
> + evmptrld_status = nested_vmx_handle_enlightened_vmptrld(vcpu, launch, true);
> if (evmptrld_status == EVMPTRLD_ERROR) {
> kvm_queue_exception(vcpu, UD_VECTOR);
> return 1;
> @@ -3893,15 +3916,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
> if (CC(vmcs12->hdr.shadow_vmcs))
> return nested_vmx_failInvalid(vcpu);
>
> - if (nested_vmx_is_evmptr12_valid(vmx)) {
> - struct hv_enlightened_vmcs *evmcs = nested_vmx_evmcs(vmx);
> -
> - copy_enlightened_to_vmcs12(vmx, evmcs->hv_clean_fields);
> - /* Enlightened VMCS doesn't have launch state */
> - vmcs12->launch_state = !launch;
> - } else if (enable_shadow_vmcs) {
> + if (!nested_vmx_is_evmptr12_valid(vmx) && enable_shadow_vmcs)
> copy_shadow_to_vmcs12(vmx);
> - }
>
> /*
> * The nested entry process starts with enforcing various prerequisites
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 90fdf130fd85..cda96196c56c 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -205,8 +205,11 @@ struct nested_vmx {
>
> #ifdef CONFIG_KVM_HYPERV
> gpa_t hv_evmcs_vmptr;
> - struct kvm_host_map hv_evmcs_map;
> + u32 hv_clean_fields;
> + bool hv_msr_bitmap;
> + bool hv_flush_hypercall;
> struct hv_enlightened_vmcs *hv_evmcs;
> + struct kvm_host_map hv_evmcs_map;
> #endif
> };
--
Vitaly
next prev parent reply other threads:[~2026-01-02 15:40 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-02 14:24 [PATCH v4 00/10] KVM: nVMX: Improve performance for unmanaged guest memory Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 01/10] KVM: nVMX: Implement cache for L1 MSR bitmap Fred Griffoul
2026-05-11 23:08 ` Sean Christopherson
2026-01-02 14:24 ` [PATCH v4 02/10] KVM: pfncache: Restore guest-uses-pfn support Fred Griffoul
2026-06-01 15:09 ` David Woodhouse
2026-01-02 14:24 ` [PATCH v4 03/10] KVM: x86: Add nested state validation for pfncache support Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 04/10] KVM: nVMX: Implement cache for L1 APIC pages Fred Griffoul
2026-05-11 23:35 ` Sean Christopherson
2026-01-02 14:24 ` [PATCH v4 05/10] KVM: selftests: Add nested VMX APIC cache invalidation test Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 06/10] KVM: nVMX: Cache evmcs fields to ensure consistency during VM-entry Fred Griffoul
2026-01-02 15:40 ` Vitaly Kuznetsov [this message]
2026-01-02 14:24 ` [PATCH v4 07/10] KVM: nVMX: Replace evmcs kvm_host_map with pfncache Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 08/10] KVM: x86: Add nested context management Fred Griffoul
2026-05-12 0:13 ` Sean Christopherson
2026-01-02 14:24 ` [PATCH v4 09/10] KVM: nVMX: Use nested context for pfncache persistence Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 10/10] KVM: selftests: Add L2 vcpu context switch test Fred Griffoul
2026-05-11 23:56 ` [PATCH v4 00/10] KVM: nVMX: Improve performance for unmanaged guest memory Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a4ywaur0.fsf@redhat.com \
--to=vkuznets@redhat.com \
--cc=dwmw@amazon.co.uk \
--cc=fgriffo@amazon.co.uk \
--cc=griffoul@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.