* [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry
@ 2019-05-07 16:06 Sean Christopherson
2019-05-07 16:06 ` [PATCH 01/15] KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped Sean Christopherson
` (15 more replies)
0 siblings, 16 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
The majority of patches in this series are loosely related optimizations
to pick off low(ish) hanging fruit in nested VM-Entry, e.g. there are
many VMREADs and VMWRITEs that can be optimized away without too much
effort.
The major change (in terms of performance) is to not "put" the vCPU
state when switching between vmcs01 and vmcs02, which can reudce the
latency of a nested VM-Entry by upwards of 1000 cycles.
A few bug fixes are prepended as they touch code that happens to be
modified by the various optimizations.
Sean Christopherson (15):
KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped
KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value
KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01
KVM: nVMX: Write ENCLS-exiting bitmap once per vmcs02
KVM: nVMX: Don't rewrite GUEST_PML_INDEX during nested VM-Entry
KVM: nVMX: Don't "put" vCPU or host state when switching VMCS
KVM: nVMX: Don't reread VMCS-agnostic state when switching VMCS
KVM: nVMX: Don't speculatively write virtual-APIC page address
KVM: nVMX: Don't speculatively write APIC-access page address
KVM: nVMX: Update vmcs12 for MSR_IA32_CR_PAT when it's written
KVM: nVMX: Update vmcs12 for SYSENTER MSRs when they're written
KVM: nVMX: Update vmcs12 for MSR_IA32_DEBUGCTLMSR when it's written
KVM: nVMX: Update vmcs02 GUEST_IA32_DEBUGCTL only when vmcs12 is dirty
KVM: nVMX: Don't update GUEST_BNDCFGS if it's clean in HV eVMCS
KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary
arch/x86/kvm/vmx/nested.c | 142 +++++++++++++++++++-------------------
arch/x86/kvm/vmx/vmx.c | 93 +++++++++++++++++--------
arch/x86/kvm/vmx/vmx.h | 5 +-
3 files changed, 136 insertions(+), 104 deletions(-)
--
2.21.0
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 01/15] KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-05-07 20:09 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 02/15] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value Sean Christopherson
` (14 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
... as a malicious userspace can run a toy guest to generate invalid
virtual-APIC page addresses in L1, i.e. flood the kernel log with error
messages.
Fixes: 690908104e39d ("KVM: nVMX: allow tests to use bad virtual-APIC page address")
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 04b40a98f60b..63f2ca847f05 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2875,9 +2875,6 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
*/
vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL,
CPU_BASED_TPR_SHADOW);
- } else {
- printk("bad virtual-APIC page address\n");
- dump_vmcs();
}
}
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 02/15] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
2019-05-07 16:06 ` [PATCH 01/15] KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-05-07 16:06 ` [PATCH 03/15] KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01 Sean Christopherson
` (13 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 60306f19105d..f3b0f4445af7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1896,9 +1896,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 03/15] KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
2019-05-07 16:06 ` [PATCH 01/15] KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped Sean Christopherson
2019-05-07 16:06 ` [PATCH 02/15] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-05-07 16:06 ` [PATCH 04/15] KVM: nVMX: Write ENCLS-exiting bitmap once per vmcs02 Sean Christopherson
` (12 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
If L1 does not set VM_ENTRY_LOAD_BNDCFGS, then L1's BNDCFGS value must
be propagated to vmcs02 since KVM always runs with VM_ENTRY_LOAD_BNDCFGS
when MPX is supported. Because the value effectively comes from vmcs01,
vmcs02 must be updated even if vmcs12 is clean.
Fixes: 62cf9bd8118c4 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS")
Cc: stable@vger.kernel.org
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 63f2ca847f05..9c31e82fb7c5 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2231,13 +2231,9 @@ static void prepare_vmcs02_full(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
set_cr4_guest_host_mask(vmx);
- if (kvm_mpx_supported()) {
- if (vmx->nested.nested_run_pending &&
- (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
- vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
- else
- vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
- }
+ if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
+ (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
+ vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
}
/*
@@ -2280,6 +2276,9 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.vmcs01_debugctl);
}
+ if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
+ !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
+ vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
vmx_set_rflags(vcpu, vmcs12->guest_rflags);
/* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 04/15] KVM: nVMX: Write ENCLS-exiting bitmap once per vmcs02
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (2 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 03/15] KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01 Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-05-07 16:06 ` [PATCH 05/15] KVM: nVMX: Don't rewrite GUEST_PML_INDEX during nested VM-Entry Sean Christopherson
` (11 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
KVM doesn't yet support SGX virtualization, i.e. writes a constant value
to ENCLS_EXITING_BITMAP so that it can intercept ENCLS and inject a #UD.
Fixes: 0b665d3040281 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 9c31e82fb7c5..094d139579fb 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1948,6 +1948,9 @@ static void prepare_vmcs02_constant_state(struct vcpu_vmx *vmx)
if (enable_pml)
vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
+ if (cpu_has_vmx_encls_vmexit())
+ vmcs_write64(ENCLS_EXITING_BITMAP, -1ull);
+
/*
* Set the MSR load/store lists to match L0's settings. Only the
* addresses are constant (for vmcs02), the counts can change based
@@ -2070,9 +2073,6 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
if (exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)
vmcs_write64(APIC_ACCESS_ADDR, -1ull);
- if (exec_control & SECONDARY_EXEC_ENCLS_EXITING)
- vmcs_write64(ENCLS_EXITING_BITMAP, -1ull);
-
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
}
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 05/15] KVM: nVMX: Don't rewrite GUEST_PML_INDEX during nested VM-Entry
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (3 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 04/15] KVM: nVMX: Write ENCLS-exiting bitmap once per vmcs02 Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-06-06 15:49 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 06/15] KVM: nVMX: Don't "put" vCPU or host state when switching VMCS Sean Christopherson
` (10 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
Emulation of GUEST_PML_INDEX for a nested VMM is a bit weird. Because
L0 flushes the PML on every VM-Exit, the value in vmcs02 at the time of
VM-Enter is a constant -1, regardless of what L1 thinks/wants.
Fixes: 09abe32002665 ("KVM: nVMX: split pieces of prepare_vmcs02() to prepare_vmcs02_early()")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 094d139579fb..a30d53823b2e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1945,8 +1945,16 @@ static void prepare_vmcs02_constant_state(struct vcpu_vmx *vmx)
if (cpu_has_vmx_msr_bitmap())
vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap));
- if (enable_pml)
+ /*
+ * Conceptually we want to copy the PML address and index from vmcs01
+ * here, and then back to vmcs01 on nested vmexit. But since we always
+ * flush the log on each vmexit and never change the PML address (once
+ * set), both fields are effectively constant in vmcs02.
+ */
+ if (enable_pml) {
vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
+ vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
+ }
if (cpu_has_vmx_encls_vmexit())
vmcs_write64(ENCLS_EXITING_BITMAP, -1ull);
@@ -2106,16 +2114,6 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
exec_control |= VM_EXIT_LOAD_IA32_EFER;
vm_exit_controls_init(vmx, exec_control);
- /*
- * Conceptually we want to copy the PML address and index from
- * vmcs01 here, and then back to vmcs01 on nested vmexit. But,
- * since we always flush the log on each vmexit and never change
- * the PML address (once set), this happens to be equivalent to
- * simply resetting the index in vmcs02.
- */
- if (enable_pml)
- vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
-
/*
* Interrupt/Exception Fields
*/
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 06/15] KVM: nVMX: Don't "put" vCPU or host state when switching VMCS
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (4 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 05/15] KVM: nVMX: Don't rewrite GUEST_PML_INDEX during nested VM-Entry Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-06-06 16:24 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 07/15] KVM: nVMX: Don't reread VMCS-agnostic " Sean Christopherson
` (9 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
When switching between vmcs01 and vmcs02, KVM isn't actually switching
between guest and host. If guest state is already loaded (the likely,
if not guaranteed, case), keep the guest state loaded and manually swap
the loaded_cpu_state pointer after propagating saved host state to the
new vmcs0{1,2}.
Avoiding the switch between guest and host reduces the latency of
switching between vmcs01 and vmcs02 by several hundred cycles, and
reduces the roundtrip time of a nested VM by upwards of 1000 cycles.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 18 +++++++++++++-
arch/x86/kvm/vmx/vmx.c | 52 ++++++++++++++++++++++-----------------
arch/x86/kvm/vmx/vmx.h | 3 ++-
3 files changed, 48 insertions(+), 25 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a30d53823b2e..4651d3462df4 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -241,15 +241,31 @@ static void free_nested(struct kvm_vcpu *vcpu)
static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct vmcs_host_state *src;
+ struct loaded_vmcs *prev;
int cpu;
if (vmx->loaded_vmcs == vmcs)
return;
cpu = get_cpu();
- vmx_vcpu_put(vcpu);
+ prev = vmx->loaded_cpu_state;
vmx->loaded_vmcs = vmcs;
vmx_vcpu_load(vcpu, cpu);
+
+ if (likely(prev)) {
+ src = &prev->host_state;
+
+ vmx_set_host_fs_gs(&vmcs->host_state, src->fs_sel, src->gs_sel,
+ src->fs_base, src->gs_base);
+
+ vmcs->host_state.ldt_sel = src->ldt_sel;
+#ifdef CONFIG_X86_64
+ vmcs->host_state.ds_sel = src->ds_sel;
+ vmcs->host_state.es_sel = src->es_sel;
+#endif
+ vmx->loaded_cpu_state = vmcs;
+ }
put_cpu();
vm_entry_controls_reset_shadow(vmx);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f3b0f4445af7..b97666731425 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1035,6 +1035,33 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
}
+void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
+ unsigned long fs_base, unsigned long gs_base)
+{
+ if (unlikely(fs_sel != host->fs_sel)) {
+ if (!(fs_sel & 7))
+ vmcs_write16(HOST_FS_SELECTOR, fs_sel);
+ else
+ vmcs_write16(HOST_FS_SELECTOR, 0);
+ host->fs_sel = fs_sel;
+ }
+ if (unlikely(gs_sel != host->gs_sel)) {
+ if (!(gs_sel & 7))
+ vmcs_write16(HOST_GS_SELECTOR, gs_sel);
+ else
+ vmcs_write16(HOST_GS_SELECTOR, 0);
+ host->gs_sel = gs_sel;
+ }
+ if (unlikely(fs_base != host->fs_base)) {
+ vmcs_writel(HOST_FS_BASE, fs_base);
+ host->fs_base = fs_base;
+ }
+ if (unlikely(gs_base != host->gs_base)) {
+ vmcs_writel(HOST_GS_BASE, gs_base);
+ host->gs_base = gs_base;
+ }
+}
+
void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -1100,28 +1127,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
gs_base = segment_base(gs_sel);
#endif
- if (unlikely(fs_sel != host_state->fs_sel)) {
- if (!(fs_sel & 7))
- vmcs_write16(HOST_FS_SELECTOR, fs_sel);
- else
- vmcs_write16(HOST_FS_SELECTOR, 0);
- host_state->fs_sel = fs_sel;
- }
- if (unlikely(gs_sel != host_state->gs_sel)) {
- if (!(gs_sel & 7))
- vmcs_write16(HOST_GS_SELECTOR, gs_sel);
- else
- vmcs_write16(HOST_GS_SELECTOR, 0);
- host_state->gs_sel = gs_sel;
- }
- if (unlikely(fs_base != host_state->fs_base)) {
- vmcs_writel(HOST_FS_BASE, fs_base);
- host_state->fs_base = fs_base;
- }
- if (unlikely(gs_base != host_state->gs_base)) {
- vmcs_writel(HOST_GS_BASE, gs_base);
- host_state->gs_base = gs_base;
- }
+ vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base);
}
static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
@@ -1310,7 +1316,7 @@ static void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
pi_set_sn(pi_desc);
}
-void vmx_vcpu_put(struct kvm_vcpu *vcpu)
+static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
{
vmx_vcpu_pi_put(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 63d37ccce3dc..f81b32ae1822 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -293,11 +293,12 @@ struct kvm_vmx {
bool nested_vmx_allowed(struct kvm_vcpu *vcpu);
void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
-void vmx_vcpu_put(struct kvm_vcpu *vcpu);
int allocate_vpid(void);
void free_vpid(int vpid);
void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
+void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
+ unsigned long fs_base, unsigned long gs_base);
int vmx_get_cpl(struct kvm_vcpu *vcpu);
unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 07/15] KVM: nVMX: Don't reread VMCS-agnostic state when switching VMCS
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (5 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 06/15] KVM: nVMX: Don't "put" vCPU or host state when switching VMCS Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-05-07 21:01 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 08/15] KVM: nVMX: Don't speculatively write virtual-APIC page address Sean Christopherson
` (8 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
When switching between vmcs01 and vmcs02, there is no need to update
state tracking for values that aren't tied to any particular VMCS as
the per-vCPU values are already up-to-date (vmx_switch_vmcs() can only
be called when the vCPU is loaded).
Avoiding the update eliminates a RDMSR, and potentially a RDPKRU and
posted-interrupt updated (cmpxchg64() and more).
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 18 +++++++++++++-----
arch/x86/kvm/vmx/vmx.h | 2 +-
3 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4651d3462df4..99164d054922 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -251,7 +251,7 @@ static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
cpu = get_cpu();
prev = vmx->loaded_cpu_state;
vmx->loaded_vmcs = vmcs;
- vmx_vcpu_load(vcpu, cpu);
+ __vmx_vcpu_load(vcpu, cpu);
if (likely(prev)) {
src = &prev->host_state;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b97666731425..0c48dee4159b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1231,11 +1231,7 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
pi_set_on(pi_desc);
}
-/*
- * Switches to specified vcpu, until a matching vcpu_put(), but assumes
- * vcpu mutex is already taken.
- */
-void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+void __vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
bool already_loaded = vmx->loaded_vmcs->cpu == cpu;
@@ -1296,8 +1292,20 @@ void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (kvm_has_tsc_control &&
vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
decache_tsc_multiplier(vmx);
+}
+
+/*
+ * Switches to specified vcpu, until a matching vcpu_put(), but assumes
+ * vcpu mutex is already taken.
+ */
+static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ __vmx_vcpu_load(vcpu, cpu);
vmx_vcpu_pi_load(vcpu, cpu);
+
vmx->host_pkru = read_pkru();
vmx->host_debugctlmsr = get_debugctlmsr();
}
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index f81b32ae1822..f62a008c9227 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -292,7 +292,7 @@ struct kvm_vmx {
};
bool nested_vmx_allowed(struct kvm_vcpu *vcpu);
-void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
+void __vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
int allocate_vpid(void);
void free_vpid(int vpid);
void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 08/15] KVM: nVMX: Don't speculatively write virtual-APIC page address
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (6 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 07/15] KVM: nVMX: Don't reread VMCS-agnostic " Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-05-07 16:06 ` [PATCH 09/15] KVM: nVMX: Don't speculatively write APIC-access " Sean Christopherson
` (7 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
The VIRTUAL_APIC_PAGE_ADDR in vmcs02 is guaranteed to be updated before
it is consumed by hardware, either in nested_vmx_enter_non_root_mode()
or via the KVM_REQ_GET_VMCS12_PAGES callback. Avoid an extra VMWRITE
and only stuff a bad value into vmcs02 when mapping vmcs12's address
fails. This also eliminates the need for extra comments to connect the
dots between prepare_vmcs02_early() and nested_get_vmcs12_pages().
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 99164d054922..32d233dee067 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2038,20 +2038,13 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
exec_control &= ~CPU_BASED_TPR_SHADOW;
exec_control |= vmcs12->cpu_based_vm_exec_control;
- /*
- * Write an illegal value to VIRTUAL_APIC_PAGE_ADDR. Later, if
- * nested_get_vmcs12_pages can't fix it up, the illegal value
- * will result in a VM entry failure.
- */
- if (exec_control & CPU_BASED_TPR_SHADOW) {
- vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, -1ull);
+ if (exec_control & CPU_BASED_TPR_SHADOW)
vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
- } else {
#ifdef CONFIG_X86_64
+ else
exec_control |= CPU_BASED_CR8_LOAD_EXITING |
CPU_BASED_CR8_STORE_EXITING;
#endif
- }
/*
* A vmexit (to either L1 hypervisor or L0 userspace) is always needed
@@ -2869,10 +2862,6 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) {
map = &vmx->nested.virtual_apic_map;
- /*
- * If translation failed, VM entry will fail because
- * prepare_vmcs02 set VIRTUAL_APIC_PAGE_ADDR to -1ull.
- */
if (!kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->virtual_apic_page_addr), map)) {
vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, pfn_to_hpa(map->pfn));
} else if (nested_cpu_has(vmcs12, CPU_BASED_CR8_LOAD_EXITING) &&
@@ -2888,6 +2877,12 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
*/
vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL,
CPU_BASED_TPR_SHADOW);
+ } else {
+ /*
+ * Write an illegal value to VIRTUAL_APIC_PAGE_ADDR to
+ * force VM-Entry to fail.
+ */
+ vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, -1ull);
}
}
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 09/15] KVM: nVMX: Don't speculatively write APIC-access page address
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (7 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 08/15] KVM: nVMX: Don't speculatively write virtual-APIC page address Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-05-07 16:06 ` [PATCH 10/15] KVM: nVMX: Update vmcs12 for MSR_IA32_CR_PAT when it's written Sean Christopherson
` (6 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
If nested_get_vmcs12_pages() fails to map L1's APIC_ACCESS_ADDR into
L2, then it disables SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES in vmcs02.
In other words, the APIC_ACCESS_ADDR in vmcs02 is guaranteed to be
written with the correct value before being consumed by hardware, drop
the unneessary VMWRITE.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 32d233dee067..29892c560771 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2082,14 +2082,6 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
vmcs_write16(GUEST_INTR_STATUS,
vmcs12->guest_intr_status);
- /*
- * Write an illegal value to APIC_ACCESS_ADDR. Later,
- * nested_get_vmcs12_pages will either fix it up or
- * remove the VM execution control.
- */
- if (exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)
- vmcs_write64(APIC_ACCESS_ADDR, -1ull);
-
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
}
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 10/15] KVM: nVMX: Update vmcs12 for MSR_IA32_CR_PAT when it's written
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (8 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 09/15] KVM: nVMX: Don't speculatively write APIC-access " Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-05-07 16:06 ` [PATCH 11/15] KVM: nVMX: Update vmcs12 for SYSENTER MSRs when they're written Sean Christopherson
` (5 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
As alluded to by the TODO comment, KVM unconditionally intercepts writes
to the PAT MSR. In the unlikely event that L1 allows L2 to write L1's
PAT directly but saves L2's PAT on VM-Exit, update vmcs12 when L2 writes
the PAT. This eliminates the need to VMREAD the value from vmcs02 on
VM-Exit as vmcs12 is already up to date in all situations.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 4 ----
arch/x86/kvm/vmx/vmx.c | 4 ++++
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 29892c560771..135773679d5b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3483,10 +3483,6 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmcs12->guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
}
- /* TODO: These cannot have changed unless we have MSR bitmaps and
- * the relevant bit asks not to trap the change */
- if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_PAT)
- vmcs12->guest_ia32_pat = vmcs_read64(GUEST_IA32_PAT);
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_EFER)
vmcs12->guest_ia32_efer = vcpu->arch.efer;
vmcs12->guest_sysenter_cs = vmcs_read32(GUEST_SYSENTER_CS);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0c48dee4159b..baa79c5a8ce7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1913,6 +1913,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (!kvm_pat_valid(data))
return 1;
+ if (is_guest_mode(vcpu) &&
+ get_vmcs12(vcpu)->vm_exit_controls & VM_EXIT_SAVE_IA32_PAT)
+ get_vmcs12(vcpu)->guest_ia32_pat = data;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 11/15] KVM: nVMX: Update vmcs12 for SYSENTER MSRs when they're written
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (9 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 10/15] KVM: nVMX: Update vmcs12 for MSR_IA32_CR_PAT when it's written Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-06-06 16:35 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 12/15] KVM: nVMX: Update vmcs12 for MSR_IA32_DEBUGCTLMSR when it's written Sean Christopherson
` (4 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
For L2, KVM always intercepts WRMSR to SYSENTER MSRs. Update vmcs12 in
the WRMSR handler so that they don't need to be (re)read from vmcs02 on
every nested VM-Exit.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 3 ---
arch/x86/kvm/vmx/vmx.c | 6 ++++++
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 135773679d5b..2e9c7bc3fb1f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3485,9 +3485,6 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_EFER)
vmcs12->guest_ia32_efer = vcpu->arch.efer;
- vmcs12->guest_sysenter_cs = vmcs_read32(GUEST_SYSENTER_CS);
- vmcs12->guest_sysenter_esp = vmcs_readl(GUEST_SYSENTER_ESP);
- vmcs12->guest_sysenter_eip = vmcs_readl(GUEST_SYSENTER_EIP);
if (kvm_mpx_supported())
vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index baa79c5a8ce7..6db16ca1b43d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1831,12 +1831,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
#endif
case MSR_IA32_SYSENTER_CS:
+ if (is_guest_mode(vcpu))
+ get_vmcs12(vcpu)->guest_sysenter_cs = data;
vmcs_write32(GUEST_SYSENTER_CS, data);
break;
case MSR_IA32_SYSENTER_EIP:
+ if (is_guest_mode(vcpu))
+ get_vmcs12(vcpu)->guest_sysenter_eip = data;
vmcs_writel(GUEST_SYSENTER_EIP, data);
break;
case MSR_IA32_SYSENTER_ESP:
+ if (is_guest_mode(vcpu))
+ get_vmcs12(vcpu)->guest_sysenter_esp = data;
vmcs_writel(GUEST_SYSENTER_ESP, data);
break;
case MSR_IA32_POWER_CTL:
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 12/15] KVM: nVMX: Update vmcs12 for MSR_IA32_DEBUGCTLMSR when it's written
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (10 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 11/15] KVM: nVMX: Update vmcs12 for SYSENTER MSRs when they're written Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-05-07 16:06 ` [PATCH 13/15] KVM: nVMX: Update vmcs02 GUEST_IA32_DEBUGCTL only when vmcs12 is dirty Sean Christopherson
` (3 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
KVM unconditionally intercepts WRMSR to MSR_IA32_DEBUGCTLMSR. In the
unlikely event that L1 allows L2 to write L1's MSR_IA32_DEBUGCTLMSR, but
but saves L2's value on VM-Exit, update vmcs12 during L2's WRMSR so as
to eliminate the need to VMREAD the value from vmcs02 on nested VM-Exit.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 4 +---
arch/x86/kvm/vmx/vmx.c | 8 ++++++++
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2e9c7bc3fb1f..2e9f8169d40a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3478,10 +3478,8 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
(vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) |
(vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE);
- if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS) {
+ if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS)
kvm_get_dr(vcpu, 7, (unsigned long *)&vmcs12->guest_dr7);
- vmcs12->guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
- }
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_EFER)
vmcs12->guest_ia32_efer = vcpu->arch.efer;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6db16ca1b43d..520bf30ff092 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1848,6 +1848,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_POWER_CTL:
vmx->msr_ia32_power_ctl = data;
break;
+ case MSR_IA32_DEBUGCTLMSR:
+ if (is_guest_mode(vcpu) && get_vmcs12(vcpu)->vm_exit_controls &
+ VM_EXIT_SAVE_DEBUG_CONTROLS)
+ get_vmcs12(vcpu)->guest_ia32_debugctl = data;
+
+ ret = kvm_set_msr_common(vcpu, msr_info);
+ break;
+
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
(!msr_info->host_initiated &&
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 13/15] KVM: nVMX: Update vmcs02 GUEST_IA32_DEBUGCTL only when vmcs12 is dirty
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (11 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 12/15] KVM: nVMX: Update vmcs12 for MSR_IA32_DEBUGCTLMSR when it's written Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-06-06 16:39 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 14/15] KVM: nVMX: Don't update GUEST_BNDCFGS if it's clean in HV eVMCS Sean Christopherson
` (2 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
VMWRITEs to GUEST_IA32_DEBUGCTL from L1 are always intercepted, and
unlike GUEST_DR7 there is no funky logic for determining the value.
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2e9f8169d40a..58717dfe82c9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2194,6 +2194,11 @@ static void prepare_vmcs02_full(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2);
vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
}
+
+ if (vmx->nested.nested_run_pending &&
+ (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
+ vmcs_write64(GUEST_IA32_DEBUGCTL,
+ vmcs12->guest_ia32_debugctl);
}
if (nested_cpu_has_xsaves(vmcs12))
@@ -2270,7 +2275,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
- vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
} else {
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.vmcs01_debugctl);
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 14/15] KVM: nVMX: Don't update GUEST_BNDCFGS if it's clean in HV eVMCS
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (12 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 13/15] KVM: nVMX: Update vmcs02 GUEST_IA32_DEBUGCTL only when vmcs12 is dirty Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-05-07 16:06 ` [PATCH 15/15] KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary Sean Christopherson
2019-06-06 16:54 ` [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Paolo Bonzini
15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
L1 is responsible for dirtying GUEST_GRP1 if it writes GUEST_BNDCFGS.
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 58717dfe82c9..cfdc04fde8eb 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2199,6 +2199,10 @@ static void prepare_vmcs02_full(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
vmcs_write64(GUEST_IA32_DEBUGCTL,
vmcs12->guest_ia32_debugctl);
+
+ if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
+ (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
+ vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
}
if (nested_cpu_has_xsaves(vmcs12))
@@ -2234,10 +2238,6 @@ static void prepare_vmcs02_full(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
set_cr4_guest_host_mask(vmx);
-
- if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
- (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
- vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
}
/*
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 15/15] KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (13 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 14/15] KVM: nVMX: Don't update GUEST_BNDCFGS if it's clean in HV eVMCS Sean Christopherson
@ 2019-05-07 16:06 ` Sean Christopherson
2019-06-06 16:53 ` Paolo Bonzini
2019-06-06 16:54 ` [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Paolo Bonzini
15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2019-05-07 16:06 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
Per Intel's SDM:
... the logical processor uses PAE paging if CR0.PG=1, CR4.PAE=1 and
IA32_EFER.LME=0. A VM entry to a guest that uses PAE paging loads the
PDPTEs into internal, non-architectural registers based on the setting
of the "enable EPT" VM-execution control.
and:
[GUEST_PDPTR] values are saved into the four PDPTE fields as follows:
- If the "enable EPT" VM-execution control is 0 or the logical
processor was not using PAE paging at the time of the VM exit,
the values saved are undefined.
In other words, if EPT is disabled or the guest isn't using PAE paging,
then the PDPTRS aren't consumed by hardware on VM-Entry and are loaded
with junk on VM-Exit. From a nesting perspective, all of the above hold
true, i.e. KVM can effectively ignore the VMCS PDPTRs. E.g. KVM already
loads the PDPTRs from memory when nested EPT is disabled (see
nested_vmx_load_cr3()).
Because KVM intercepts setting CR4.PAE, there is no danger of consuming
a stale value or crushing L1's VMWRITEs regardless of whether L1
intercepts CR4.PAE. The vmcs12's values are unchanged up until the
VM-Exit where L2 sets CR4.PAE, i.e. L0 will see the new PAE state on the
subsequent VM-Entry and propagate the PDPTRs from vmcs12 to vmcs02.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
arch/x86/kvm/vmx/nested.c | 36 +++++++++++++++++++++---------------
1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index cfdc04fde8eb..b8bd446b2c8b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2184,17 +2184,6 @@ static void prepare_vmcs02_full(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
vmcs_writel(GUEST_SYSENTER_ESP, vmcs12->guest_sysenter_esp);
vmcs_writel(GUEST_SYSENTER_EIP, vmcs12->guest_sysenter_eip);
- /*
- * L1 may access the L2's PDPTR, so save them to construct
- * vmcs12
- */
- if (enable_ept) {
- vmcs_write64(GUEST_PDPTR0, vmcs12->guest_pdptr0);
- vmcs_write64(GUEST_PDPTR1, vmcs12->guest_pdptr1);
- vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2);
- vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
- }
-
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
vmcs_write64(GUEST_IA32_DEBUGCTL,
@@ -2256,10 +2245,15 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct hv_enlightened_vmcs *hv_evmcs = vmx->nested.hv_evmcs;
+ bool load_guest_pdptrs_vmcs12 = false;
if (vmx->nested.dirty_vmcs12 || vmx->nested.hv_evmcs) {
prepare_vmcs02_full(vmx, vmcs12);
vmx->nested.dirty_vmcs12 = false;
+
+ load_guest_pdptrs_vmcs12 = !vmx->nested.hv_evmcs ||
+ !(vmx->nested.hv_evmcs->hv_clean_fields &
+ HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1);
}
/*
@@ -2366,6 +2360,15 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
return -EINVAL;
}
+ /* Late preparation of GUEST_PDPTRs now that EFER and CRs are set. */
+ if (load_guest_pdptrs_vmcs12 && nested_cpu_has_ept(vmcs12) &&
+ !is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu)) {
+ vmcs_write64(GUEST_PDPTR0, vmcs12->guest_pdptr0);
+ vmcs_write64(GUEST_PDPTR1, vmcs12->guest_pdptr1);
+ vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2);
+ vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
+ }
+
/* Shadow page tables on either EPT or shadow page tables. */
if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12),
entry_failure_code))
@@ -3467,10 +3470,13 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
*/
if (enable_ept) {
vmcs12->guest_cr3 = vmcs_readl(GUEST_CR3);
- vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
- vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
- vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
- vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3);
+ if (nested_cpu_has_ept(vmcs12) && !is_long_mode(vcpu) &&
+ is_pae(vcpu) && is_paging(vcpu)) {
+ vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
+ vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
+ vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
+ vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3);
+ }
}
vmcs12->guest_linear_address = vmcs_readl(GUEST_LINEAR_ADDRESS);
--
2.21.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 01/15] KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped
2019-05-07 16:06 ` [PATCH 01/15] KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped Sean Christopherson
@ 2019-05-07 20:09 ` Paolo Bonzini
0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2019-05-07 20:09 UTC (permalink / raw)
To: Sean Christopherson, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
On 07/05/19 11:06, Sean Christopherson wrote:
> ... as a malicious userspace can run a toy guest to generate invalid
> virtual-APIC page addresses in L1, i.e. flood the kernel log with error
> messages.
>
> Fixes: 690908104e39d ("KVM: nVMX: allow tests to use bad virtual-APIC page address")
> Cc: stable@vger.kernel.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
The same is true even of dump_vmcs caused by emulation failures. I'm
thinking of just hiding dump_vmcs beneath a module parameter.
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 07/15] KVM: nVMX: Don't reread VMCS-agnostic state when switching VMCS
2019-05-07 16:06 ` [PATCH 07/15] KVM: nVMX: Don't reread VMCS-agnostic " Sean Christopherson
@ 2019-05-07 21:01 ` Paolo Bonzini
0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2019-05-07 21:01 UTC (permalink / raw)
To: Sean Christopherson, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
On 07/05/19 11:06, Sean Christopherson wrote:
> -void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +void __vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
Let's call this vmx_vcpu_load_vmcs.
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 05/15] KVM: nVMX: Don't rewrite GUEST_PML_INDEX during nested VM-Entry
2019-05-07 16:06 ` [PATCH 05/15] KVM: nVMX: Don't rewrite GUEST_PML_INDEX during nested VM-Entry Sean Christopherson
@ 2019-06-06 15:49 ` Paolo Bonzini
0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2019-06-06 15:49 UTC (permalink / raw)
To: Sean Christopherson, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
On 07/05/19 18:06, Sean Christopherson wrote:
> - if (enable_pml)
> + /*
> + * Conceptually we want to copy the PML address and index from vmcs01
> + * here, and then back to vmcs01 on nested vmexit. But since we always
> + * flush the log on each vmexit and never change the PML address (once
> + * set), both fields are effectively constant in vmcs02.
> + */
> + if (enable_pml) {
> vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
> + vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
> + }
Yeah, it will be rewritten in vmx_flush_pml_buffer.
Just a little rephrasing of the comment:
+ * The PML address never changes, so it is constant in vmcs02.
+ * Conceptually we want to copy the PML index from vmcs01 here,
+ * and then back to vmcs01 on nested vmexit. But since we flush
+ * the log and reset GUEST_PML_INDEX on each vmexit, the PML
+ * index is also effectively constant in vmcs02.
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 06/15] KVM: nVMX: Don't "put" vCPU or host state when switching VMCS
2019-05-07 16:06 ` [PATCH 06/15] KVM: nVMX: Don't "put" vCPU or host state when switching VMCS Sean Christopherson
@ 2019-06-06 16:24 ` Paolo Bonzini
2019-06-06 18:57 ` Sean Christopherson
0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2019-06-06 16:24 UTC (permalink / raw)
To: Sean Christopherson, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
On 07/05/19 18:06, Sean Christopherson wrote:
> When switching between vmcs01 and vmcs02, KVM isn't actually switching
> between guest and host. If guest state is already loaded (the likely,
> if not guaranteed, case), keep the guest state loaded and manually swap
> the loaded_cpu_state pointer after propagating saved host state to the
> new vmcs0{1,2}.
>
> Avoiding the switch between guest and host reduces the latency of
> switching between vmcs01 and vmcs02 by several hundred cycles, and
> reduces the roundtrip time of a nested VM by upwards of 1000 cycles.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
> arch/x86/kvm/vmx/nested.c | 18 +++++++++++++-
> arch/x86/kvm/vmx/vmx.c | 52 ++++++++++++++++++++++-----------------
> arch/x86/kvm/vmx/vmx.h | 3 ++-
> 3 files changed, 48 insertions(+), 25 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index a30d53823b2e..4651d3462df4 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -241,15 +241,31 @@ static void free_nested(struct kvm_vcpu *vcpu)
> static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> + struct vmcs_host_state *src;
> + struct loaded_vmcs *prev;
> int cpu;
>
> if (vmx->loaded_vmcs == vmcs)
> return;
>
> cpu = get_cpu();
> - vmx_vcpu_put(vcpu);
> + prev = vmx->loaded_cpu_state;
> vmx->loaded_vmcs = vmcs;
> vmx_vcpu_load(vcpu, cpu);
> +
> + if (likely(prev)) {
> + src = &prev->host_state;
> +
> + vmx_set_host_fs_gs(&vmcs->host_state, src->fs_sel, src->gs_sel,
> + src->fs_base, src->gs_base);
> +
> + vmcs->host_state.ldt_sel = src->ldt_sel;
> +#ifdef CONFIG_X86_64
> + vmcs->host_state.ds_sel = src->ds_sel;
> + vmcs->host_state.es_sel = src->es_sel;
> +#endif
> + vmx->loaded_cpu_state = vmcs;
> + }
> put_cpu();
I'd like to extract this into a separate function:
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 438fae1fef2a..83e436f201bf 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -248,34 +248,40 @@ static void free_nested(struct kvm_vcpu *vcpu)
free_loaded_vmcs(&vmx->nested.vmcs02);
}
+static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx)
+{
+ struct loaded_vmcs *prev = vmx->loaded_cpu_state;
+ struct loaded_vmcs *cur;
+ struct vmcs_host_state *dest, *src;
+
+ if (unlikely(!prev))
+ return;
+
+ cur = &vmx->loaded_vmcs;
+ src = &prev->host_state;
+ dest = &cur->host_state;
+
+ vmx_set_host_fs_gs(dest, src->fs_sel, src->gs_sel, src->fs_base, src->gs_base);
+ dest->ldt_sel = src->ldt_sel;
+#ifdef CONFIG_X86_64
+ dest->ds_sel = src->ds_sel;
+ dest->es_sel = src->es_sel;
+#endif
+ vmx->loaded_cpu_state = cur;
+}
+
static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- struct vmcs_host_state *src;
- struct loaded_vmcs *prev;
int cpu;
if (vmx->loaded_vmcs == vmcs)
return;
cpu = get_cpu();
- prev = vmx->loaded_cpu_state;
vmx->loaded_vmcs = vmcs;
vmx_vcpu_load(vcpu, cpu);
-
- if (likely(prev)) {
- src = &prev->host_state;
-
- vmx_set_host_fs_gs(&vmcs->host_state, src->fs_sel, src->gs_sel,
- src->fs_base, src->gs_base);
-
- vmcs->host_state.ldt_sel = src->ldt_sel;
-#ifdef CONFIG_X86_64
- vmcs->host_state.ds_sel = src->ds_sel;
- vmcs->host_state.es_sel = src->es_sel;
-#endif
- vmx->loaded_cpu_state = vmcs;
- }
+ vmx_sync_vmcs_host_state(vmx);
put_cpu();
vm_entry_controls_reset_shadow(vmx);
Paolo
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 11/15] KVM: nVMX: Update vmcs12 for SYSENTER MSRs when they're written
2019-05-07 16:06 ` [PATCH 11/15] KVM: nVMX: Update vmcs12 for SYSENTER MSRs when they're written Sean Christopherson
@ 2019-06-06 16:35 ` Paolo Bonzini
0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2019-06-06 16:35 UTC (permalink / raw)
To: Sean Christopherson, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
On 07/05/19 18:06, Sean Christopherson wrote:
> - vmcs12->guest_sysenter_cs = vmcs_read32(GUEST_SYSENTER_CS);
> - vmcs12->guest_sysenter_esp = vmcs_readl(GUEST_SYSENTER_ESP);
> - vmcs12->guest_sysenter_eip = vmcs_readl(GUEST_SYSENTER_EIP);
I moved these a bit earlier, together with all other fields that are
unconditional and simply have to be vmread from the vmcs02 into the vmcs12.
Paolo
> if (kvm_mpx_supported())
> vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 13/15] KVM: nVMX: Update vmcs02 GUEST_IA32_DEBUGCTL only when vmcs12 is dirty
2019-05-07 16:06 ` [PATCH 13/15] KVM: nVMX: Update vmcs02 GUEST_IA32_DEBUGCTL only when vmcs12 is dirty Sean Christopherson
@ 2019-06-06 16:39 ` Paolo Bonzini
0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2019-06-06 16:39 UTC (permalink / raw)
To: Sean Christopherson, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
On 07/05/19 18:06, Sean Christopherson wrote:
> VMWRITEs to GUEST_IA32_DEBUGCTL from L1 are always intercepted, and
> unlike GUEST_DR7 there is no funky logic for determining the value.
>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
> arch/x86/kvm/vmx/nested.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 2e9f8169d40a..58717dfe82c9 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2194,6 +2194,11 @@ static void prepare_vmcs02_full(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2);
> vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
> }
> +
> + if (vmx->nested.nested_run_pending &&
> + (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
> + vmcs_write64(GUEST_IA32_DEBUGCTL,
> + vmcs12->guest_ia32_debugctl);
> }
>
> if (nested_cpu_has_xsaves(vmcs12))
> @@ -2270,7 +2275,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> if (vmx->nested.nested_run_pending &&
> (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
> kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
> - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
> } else {
> kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
> vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.vmcs01_debugctl);
>
I'm passing on this one. It really gets more complicated and I'm not
sure the savings are worth it.
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 15/15] KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary
2019-05-07 16:06 ` [PATCH 15/15] KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary Sean Christopherson
@ 2019-06-06 16:53 ` Paolo Bonzini
0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2019-06-06 16:53 UTC (permalink / raw)
To: Sean Christopherson, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
On 07/05/19 18:06, Sean Christopherson wrote:
> Per Intel's SDM:
>
> ... the logical processor uses PAE paging if CR0.PG=1, CR4.PAE=1 and
> IA32_EFER.LME=0. A VM entry to a guest that uses PAE paging loads the
> PDPTEs into internal, non-architectural registers based on the setting
> of the "enable EPT" VM-execution control.
>
> and:
>
> [GUEST_PDPTR] values are saved into the four PDPTE fields as follows:
>
> - If the "enable EPT" VM-execution control is 0 or the logical
> processor was not using PAE paging at the time of the VM exit,
> the values saved are undefined.
>
> In other words, if EPT is disabled or the guest isn't using PAE paging,
> then the PDPTRS aren't consumed by hardware on VM-Entry and are loaded
> with junk on VM-Exit. From a nesting perspective, all of the above hold
> true, i.e. KVM can effectively ignore the VMCS PDPTRs. E.g. KVM already
> loads the PDPTRs from memory when nested EPT is disabled (see
> nested_vmx_load_cr3()).
>
> Because KVM intercepts setting CR4.PAE, there is no danger of consuming
> a stale value or crushing L1's VMWRITEs regardless of whether L1
> intercepts CR4.PAE. The vmcs12's values are unchanged up until the
> VM-Exit where L2 sets CR4.PAE, i.e. L0 will see the new PAE state on the
> subsequent VM-Entry and propagate the PDPTRs from vmcs12 to vmcs02.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
> arch/x86/kvm/vmx/nested.c | 36 +++++++++++++++++++++---------------
> 1 file changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index cfdc04fde8eb..b8bd446b2c8b 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2184,17 +2184,6 @@ static void prepare_vmcs02_full(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> vmcs_writel(GUEST_SYSENTER_ESP, vmcs12->guest_sysenter_esp);
> vmcs_writel(GUEST_SYSENTER_EIP, vmcs12->guest_sysenter_eip);
>
> - /*
> - * L1 may access the L2's PDPTR, so save them to construct
> - * vmcs12
> - */
> - if (enable_ept) {
> - vmcs_write64(GUEST_PDPTR0, vmcs12->guest_pdptr0);
> - vmcs_write64(GUEST_PDPTR1, vmcs12->guest_pdptr1);
> - vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2);
> - vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
> - }
> -
> if (vmx->nested.nested_run_pending &&
> (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
> vmcs_write64(GUEST_IA32_DEBUGCTL,
> @@ -2256,10 +2245,15 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> struct hv_enlightened_vmcs *hv_evmcs = vmx->nested.hv_evmcs;
> + bool load_guest_pdptrs_vmcs12 = false;
>
> if (vmx->nested.dirty_vmcs12 || vmx->nested.hv_evmcs) {
> prepare_vmcs02_full(vmx, vmcs12);
> vmx->nested.dirty_vmcs12 = false;
> +
> + load_guest_pdptrs_vmcs12 = !vmx->nested.hv_evmcs ||
> + !(vmx->nested.hv_evmcs->hv_clean_fields &
> + HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1);
> }
>
> /*
> @@ -2366,6 +2360,15 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> return -EINVAL;
> }
>
> + /* Late preparation of GUEST_PDPTRs now that EFER and CRs are set. */
> + if (load_guest_pdptrs_vmcs12 && nested_cpu_has_ept(vmcs12) &&
> + !is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu)) {
> + vmcs_write64(GUEST_PDPTR0, vmcs12->guest_pdptr0);
> + vmcs_write64(GUEST_PDPTR1, vmcs12->guest_pdptr1);
> + vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2);
> + vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
> + }
This probably should be merged into nested_vmx_load_cr3, but something
for later. I've just sent a patch to create a new is_pae_paging
function that can be used here.
Paolo
> /* Shadow page tables on either EPT or shadow page tables. */
> if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12),
> entry_failure_code))
> @@ -3467,10 +3470,13 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> */
> if (enable_ept) {
> vmcs12->guest_cr3 = vmcs_readl(GUEST_CR3);
> - vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
> - vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
> - vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
> - vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3);
> + if (nested_cpu_has_ept(vmcs12) && !is_long_mode(vcpu) &&
> + is_pae(vcpu) && is_paging(vcpu)) {
> + vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
> + vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
> + vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
> + vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3);
> + }
> }
>
> vmcs12->guest_linear_address = vmcs_readl(GUEST_LINEAR_ADDRESS);
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
` (14 preceding siblings ...)
2019-05-07 16:06 ` [PATCH 15/15] KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary Sean Christopherson
@ 2019-06-06 16:54 ` Paolo Bonzini
15 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2019-06-06 16:54 UTC (permalink / raw)
To: Sean Christopherson, Radim Krčmář
Cc: kvm, Nadav Amit, Liran Alon, Vitaly Kuznetsov
On 07/05/19 18:06, Sean Christopherson wrote:
> The majority of patches in this series are loosely related optimizations
> to pick off low(ish) hanging fruit in nested VM-Entry, e.g. there are
> many VMREADs and VMWRITEs that can be optimized away without too much
> effort.
>
> The major change (in terms of performance) is to not "put" the vCPU
> state when switching between vmcs01 and vmcs02, which can reudce the
> latency of a nested VM-Entry by upwards of 1000 cycles.
>
> A few bug fixes are prepended as they touch code that happens to be
> modified by the various optimizations.
I've queued the patches locally, but it will be a few days before I can
give them adequate testing so I have not yet pushed them to kvm/queue.
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 06/15] KVM: nVMX: Don't "put" vCPU or host state when switching VMCS
2019-06-06 16:24 ` Paolo Bonzini
@ 2019-06-06 18:57 ` Sean Christopherson
2019-06-07 17:00 ` Paolo Bonzini
0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2019-06-06 18:57 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Radim Krčmář, kvm, Nadav Amit, Liran Alon,
Vitaly Kuznetsov
On Thu, Jun 06, 2019 at 06:24:43PM +0200, Paolo Bonzini wrote:
> On 07/05/19 18:06, Sean Christopherson wrote:
> > When switching between vmcs01 and vmcs02, KVM isn't actually switching
> > between guest and host. If guest state is already loaded (the likely,
> > if not guaranteed, case), keep the guest state loaded and manually swap
> > the loaded_cpu_state pointer after propagating saved host state to the
> > new vmcs0{1,2}.
> >
> > Avoiding the switch between guest and host reduces the latency of
> > switching between vmcs01 and vmcs02 by several hundred cycles, and
> > reduces the roundtrip time of a nested VM by upwards of 1000 cycles.
> >
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> > arch/x86/kvm/vmx/nested.c | 18 +++++++++++++-
> > arch/x86/kvm/vmx/vmx.c | 52 ++++++++++++++++++++++-----------------
> > arch/x86/kvm/vmx/vmx.h | 3 ++-
> > 3 files changed, 48 insertions(+), 25 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index a30d53823b2e..4651d3462df4 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -241,15 +241,31 @@ static void free_nested(struct kvm_vcpu *vcpu)
> > static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
> > {
> > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > + struct vmcs_host_state *src;
> > + struct loaded_vmcs *prev;
> > int cpu;
> >
> > if (vmx->loaded_vmcs == vmcs)
> > return;
> >
> > cpu = get_cpu();
> > - vmx_vcpu_put(vcpu);
> > + prev = vmx->loaded_cpu_state;
> > vmx->loaded_vmcs = vmcs;
> > vmx_vcpu_load(vcpu, cpu);
> > +
> > + if (likely(prev)) {
> > + src = &prev->host_state;
> > +
> > + vmx_set_host_fs_gs(&vmcs->host_state, src->fs_sel, src->gs_sel,
> > + src->fs_base, src->gs_base);
> > +
> > + vmcs->host_state.ldt_sel = src->ldt_sel;
> > +#ifdef CONFIG_X86_64
> > + vmcs->host_state.ds_sel = src->ds_sel;
> > + vmcs->host_state.es_sel = src->es_sel;
> > +#endif
> > + vmx->loaded_cpu_state = vmcs;
> > + }
> > put_cpu();
>
> I'd like to extract this into a separate function:
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 438fae1fef2a..83e436f201bf 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -248,34 +248,40 @@ static void free_nested(struct kvm_vcpu *vcpu)
> free_loaded_vmcs(&vmx->nested.vmcs02);
> }
>
> +static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx)
What about taking the vmcs pointers, and using old/new instead of
prev/cur? Calling it prev is wonky since it's pulled from the current
value of loaded_cpu_state, especially since cur is the same type.
That oddity is also why I grabbed prev before setting loaded_vmcs,
it just felt wrong even though they really are two separate things.
static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
struct loaded_vmcs *old,
struct loaded_vmcs *new)
{
...
}
{
vmx_sync_vmcs_host_state(vmx, vmx->loaded_cpu_state, vmcs);
}
> +{
> + struct loaded_vmcs *prev = vmx->loaded_cpu_state;
> + struct loaded_vmcs *cur;
> + struct vmcs_host_state *dest, *src;
> +
> + if (unlikely(!prev))
> + return;
> +
> + cur = &vmx->loaded_vmcs;
> + src = &prev->host_state;
> + dest = &cur->host_state;
> +
> + vmx_set_host_fs_gs(dest, src->fs_sel, src->gs_sel, src->fs_base, src->gs_base);
> + dest->ldt_sel = src->ldt_sel;
> +#ifdef CONFIG_X86_64
> + dest->ds_sel = src->ds_sel;
> + dest->es_sel = src->es_sel;
> +#endif
> + vmx->loaded_cpu_state = cur;
> +}
> +
> static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> - struct vmcs_host_state *src;
> - struct loaded_vmcs *prev;
> int cpu;
>
> if (vmx->loaded_vmcs == vmcs)
> return;
>
> cpu = get_cpu();
> - prev = vmx->loaded_cpu_state;
> vmx->loaded_vmcs = vmcs;
> vmx_vcpu_load(vcpu, cpu);
> -
> - if (likely(prev)) {
> - src = &prev->host_state;
> -
> - vmx_set_host_fs_gs(&vmcs->host_state, src->fs_sel, src->gs_sel,
> - src->fs_base, src->gs_base);
> -
> - vmcs->host_state.ldt_sel = src->ldt_sel;
> -#ifdef CONFIG_X86_64
> - vmcs->host_state.ds_sel = src->ds_sel;
> - vmcs->host_state.es_sel = src->es_sel;
> -#endif
> - vmx->loaded_cpu_state = vmcs;
> - }
> + vmx_sync_vmcs_host_state(vmx);
> put_cpu();
>
> vm_entry_controls_reset_shadow(vmx);
>
> Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 06/15] KVM: nVMX: Don't "put" vCPU or host state when switching VMCS
2019-06-06 18:57 ` Sean Christopherson
@ 2019-06-07 17:00 ` Paolo Bonzini
2019-06-07 17:08 ` Sean Christopherson
0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2019-06-07 17:00 UTC (permalink / raw)
To: Sean Christopherson
Cc: Radim Krčmář, kvm, Nadav Amit, Liran Alon,
Vitaly Kuznetsov
On 06/06/19 20:57, Sean Christopherson wrote:
> What about taking the vmcs pointers, and using old/new instead of
> prev/cur? Calling it prev is wonky since it's pulled from the current
> value of loaded_cpu_state, especially since cur is the same type.
> That oddity is also why I grabbed prev before setting loaded_vmcs,
> it just felt wrong even though they really are two separate things.
>
> static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
> struct loaded_vmcs *old,
> struct loaded_vmcs *new)
I had it like that in the beginning actually. But the idea of this
function is that because we're switching vmcs's, the host register
fields have to be moved to the VMCS that will be used next. I don't see
how it would be used with old and new being anything other than
vmx->loaded_cpu_state and vmx->loaded_vmcs and, because we're switching
VMCS, those are the "previously" active VMCS and the "currently" active
VMCS.
What would also make sense, is to change loaded_cpu_state to a bool (it
must always be equal to loaded_vmcs anyway) and make the prototype
something like this:
static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
struct loaded_vmcs *prev)
I'll send a patch.
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 06/15] KVM: nVMX: Don't "put" vCPU or host state when switching VMCS
2019-06-07 17:00 ` Paolo Bonzini
@ 2019-06-07 17:08 ` Sean Christopherson
0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-06-07 17:08 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Radim Krčmář, kvm, Nadav Amit, Liran Alon,
Vitaly Kuznetsov
On Fri, Jun 07, 2019 at 07:00:06PM +0200, Paolo Bonzini wrote:
> On 06/06/19 20:57, Sean Christopherson wrote:
> > What about taking the vmcs pointers, and using old/new instead of
> > prev/cur? Calling it prev is wonky since it's pulled from the current
> > value of loaded_cpu_state, especially since cur is the same type.
> > That oddity is also why I grabbed prev before setting loaded_vmcs,
> > it just felt wrong even though they really are two separate things.
> >
> > static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
> > struct loaded_vmcs *old,
> > struct loaded_vmcs *new)
>
> I had it like that in the beginning actually. But the idea of this
> function is that because we're switching vmcs's, the host register
> fields have to be moved to the VMCS that will be used next. I don't see
> how it would be used with old and new being anything other than
> vmx->loaded_cpu_state and vmx->loaded_vmcs and, because we're switching
> VMCS, those are the "previously" active VMCS and the "currently" active
> VMCS.
>
> What would also make sense, is to change loaded_cpu_state to a bool (it
> must always be equal to loaded_vmcs anyway) and make the prototype
> something like this:
>
> static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
> struct loaded_vmcs *prev)
>
>
> I'll send a patch.
Works for me. The only reason I made loaded_cpu_state was so that
vmx_prepare_switch_to_host() could WARN on it diverging from loaded_vmcs.
Seeing as how that WARN has never fired, I'm comfortable making it a bool.
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2019-06-07 17:08 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-05-07 16:06 [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Sean Christopherson
2019-05-07 16:06 ` [PATCH 01/15] KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped Sean Christopherson
2019-05-07 20:09 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 02/15] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value Sean Christopherson
2019-05-07 16:06 ` [PATCH 03/15] KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01 Sean Christopherson
2019-05-07 16:06 ` [PATCH 04/15] KVM: nVMX: Write ENCLS-exiting bitmap once per vmcs02 Sean Christopherson
2019-05-07 16:06 ` [PATCH 05/15] KVM: nVMX: Don't rewrite GUEST_PML_INDEX during nested VM-Entry Sean Christopherson
2019-06-06 15:49 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 06/15] KVM: nVMX: Don't "put" vCPU or host state when switching VMCS Sean Christopherson
2019-06-06 16:24 ` Paolo Bonzini
2019-06-06 18:57 ` Sean Christopherson
2019-06-07 17:00 ` Paolo Bonzini
2019-06-07 17:08 ` Sean Christopherson
2019-05-07 16:06 ` [PATCH 07/15] KVM: nVMX: Don't reread VMCS-agnostic " Sean Christopherson
2019-05-07 21:01 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 08/15] KVM: nVMX: Don't speculatively write virtual-APIC page address Sean Christopherson
2019-05-07 16:06 ` [PATCH 09/15] KVM: nVMX: Don't speculatively write APIC-access " Sean Christopherson
2019-05-07 16:06 ` [PATCH 10/15] KVM: nVMX: Update vmcs12 for MSR_IA32_CR_PAT when it's written Sean Christopherson
2019-05-07 16:06 ` [PATCH 11/15] KVM: nVMX: Update vmcs12 for SYSENTER MSRs when they're written Sean Christopherson
2019-06-06 16:35 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 12/15] KVM: nVMX: Update vmcs12 for MSR_IA32_DEBUGCTLMSR when it's written Sean Christopherson
2019-05-07 16:06 ` [PATCH 13/15] KVM: nVMX: Update vmcs02 GUEST_IA32_DEBUGCTL only when vmcs12 is dirty Sean Christopherson
2019-06-06 16:39 ` Paolo Bonzini
2019-05-07 16:06 ` [PATCH 14/15] KVM: nVMX: Don't update GUEST_BNDCFGS if it's clean in HV eVMCS Sean Christopherson
2019-05-07 16:06 ` [PATCH 15/15] KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary Sean Christopherson
2019-06-06 16:53 ` Paolo Bonzini
2019-06-06 16:54 ` [PATCH 00/15] KVM: nVMX: Optimize nested VM-Entry Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox