From: Sean Christopherson <seanjc@google.com>
To: Lei Wang <lei4.wang@intel.com>
Cc: pbonzini@redhat.com, vkuznets@redhat.com, wanpengli@tencent.com,
jmattson@google.com, joro@8bytes.org, chenyi.qiang@intel.com,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 8/8] KVM: VMX: Enable PKS for nested VM
Date: Fri, 20 May 2022 01:24:37 +0000 [thread overview]
Message-ID: <Yobt1XwOfb5M6Dfa@google.com> (raw)
In-Reply-To: <20220424101557.134102-9-lei4.wang@intel.com>
Nit, use "KVM: nVMX:" for the shortlog scope.
On Sun, Apr 24, 2022, Lei Wang wrote:
> @@ -2433,6 +2437,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
> (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
> vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
> +
> + if (vmx->nested.nested_run_pending &&
> + (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))
> + vmcs_write64(GUEST_IA32_PKRS, vmcs12->guest_ia32_pkrs);
As mentioned in the BNDCFGS thread, this does the wrong thing for SMM. But, after
a lot of thought, handling this in nested_vmx_enter_non_root_mode() would be little
more than a band-aid, and a messy one at that, because KVM's SMM emulation is
horrifically broken with respect to nVMX.
Entry does to SMM does not modify _any_ state that is not saved in SMRAM. That
we're having to deal with this crap is a symptom of KVM doing the complete wrong
thing by piggybacking nested_vmx_vmexit() and nested_vmx_enter_non_root_mode().
The SDM's description of CET spells this out very, very clearly:
On processors that support CET shadow stacks, when the processor enters SMM,
the processor saves the SSP register to the SMRAM state save area (see Table 31-3)
and clears CR4.CET to 0. Thus, the initial execution environment of the SMI handler
has CET disabled and all of the CET state of the interrupted program is still in the
machine. An SMM that uses CET is required to save the interrupted program’s CET
state and restore the CET state prior to exiting SMM.
It mostly works because no guest SMM handler does anything with most of the MSRs,
but it's all wildy wrong. A concrete example of a lurking bug is if vmcs12 uses
the VM-Exit MSR load list, in which case the forced nested_vmx_vmexit() will load
state that is never undone.
So, my very strong vote is to ignore SMM and let someone who actually cares about
SMM fix that mess properly by adding custom flows for exiting/re-entering L2 on
SMI/RSM.
> }
>
> if (nested_cpu_has_xsaves(vmcs12))
> @@ -2521,6 +2529,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
> !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
> vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
> + if (kvm_cpu_cap_has(X86_FEATURE_PKS) &&
ERROR: trailing whitespace
#85: FILE: arch/x86/kvm/vmx/nested.c:3407:
+^Iif (kvm_cpu_cap_has(X86_FEATURE_PKS) && $
> + (!vmx->nested.nested_run_pending ||
> + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)))
> + vmcs_write64(GUEST_IA32_PKRS, vmx->nested.vmcs01_guest_pkrs);
> +
> vmx_set_rflags(vcpu, vmcs12->guest_rflags);
>
> /* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the
> @@ -2897,6 +2910,10 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
> vmcs12->host_ia32_perf_global_ctrl)))
> return -EINVAL;
>
> + if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) &&
> + CC(!kvm_pkrs_valid(vmcs12->host_ia32_pkrs)))
> + return -EINVAL;
> +
> #ifdef CONFIG_X86_64
> ia32e = !!(vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE);
> #else
> @@ -3049,6 +3066,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> if (nested_check_guest_non_reg_state(vmcs12))
> return -EINVAL;
>
> + if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS) &&
> + CC(!kvm_pkrs_valid(vmcs12->guest_ia32_pkrs)))
> + return -EINVAL;
> +
> return 0;
> }
>
> @@ -3384,6 +3405,10 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> (!from_vmentry ||
> !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
> vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
> + if (kvm_cpu_cap_has(X86_FEATURE_PKS) &&
> + (!from_vmentry ||
This should be "!vmx->nested.nested_run_pending" instead of "!from_vmentry" to
avoid the unnecessary VMREAD when restoring L2 with a pending VM-Enter.
> + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)))
> + vmx->nested.vmcs01_guest_pkrs = vmcs_read64(GUEST_IA32_PKRS);
>
> /*
> * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
...
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 91723a226bf3..82f79ac46d7b 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -222,6 +222,8 @@ struct nested_vmx {
> u64 vmcs01_debugctl;
> u64 vmcs01_guest_bndcfgs;
>
Please pack these together, i.e. don't have a blank line between the various
vmcs01_* fields.
> + u64 vmcs01_guest_pkrs;
> +
> /* to migrate it to L1 if L2 writes to L1's CR8 directly */
> int l1_tpr_threshold;
>
> --
> 2.25.1
>
next prev parent reply other threads:[~2022-05-20 1:24 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-24 10:15 [PATCH v7 0/8] KVM: PKS Virtualization support Lei Wang
2022-04-24 10:15 ` [PATCH v7 1/8] KVM: VMX: Introduce PKS VMCS fields Lei Wang
2022-05-24 20:55 ` Sean Christopherson
2022-05-27 1:55 ` Wang, Lei
2022-04-24 10:15 ` [PATCH v7 2/8] KVM: VMX: Add proper cache tracking for PKRS Lei Wang
2022-05-24 21:00 ` Sean Christopherson
2022-05-27 2:16 ` Wang, Lei
2022-04-24 10:15 ` [PATCH v7 3/8] KVM: X86: Expose IA32_PKRS MSR Lei Wang
2022-05-24 22:11 ` Sean Christopherson
2022-05-27 9:21 ` Wang, Lei
2022-04-24 10:15 ` [PATCH v7 4/8] KVM: MMU: Rename the pkru to pkr Lei Wang
2022-04-24 10:15 ` [PATCH v7 5/8] KVM: MMU: Add helper function to get pkr bits Lei Wang
2022-05-24 23:21 ` Sean Christopherson
2022-05-27 9:28 ` Wang, Lei
2022-04-24 10:15 ` [PATCH v7 6/8] KVM: MMU: Add support for PKS emulation Lei Wang
2022-05-24 23:28 ` Sean Christopherson
2022-05-27 9:40 ` Wang, Lei
2022-04-24 10:15 ` [PATCH v7 7/8] KVM: VMX: Expose PKS to guest Lei Wang
2022-05-24 23:34 ` Sean Christopherson
2022-05-27 9:42 ` Wang, Lei
2022-04-24 10:15 ` [PATCH v7 8/8] KVM: VMX: Enable PKS for nested VM Lei Wang
2022-05-20 1:24 ` Sean Christopherson [this message]
2022-05-27 9:55 ` Wang, Lei
2022-05-06 7:32 ` [PATCH v7 0/8] KVM: PKS Virtualization support Wang, Lei
2025-11-10 16:29 ` The current status of PKS virtualization Ruihan Li
2025-11-10 20:44 ` Paolo Bonzini
2025-11-11 1:14 ` Ruihan Li
2025-11-11 5:40 ` Chenyi Qiang
2025-11-11 14:24 ` Ruihan Li
2025-11-12 1:06 ` Chenyi Qiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yobt1XwOfb5M6Dfa@google.com \
--to=seanjc@google.com \
--cc=chenyi.qiang@intel.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=lei4.wang@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).