linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86, kvm, vmx: Always use LOAD_IA32_EFER if available
@ 2014-11-08  2:25 Andy Lutomirski
  2014-11-10  4:33 ` Wanpeng Li
  2014-11-12 11:38 ` Paolo Bonzini
  0 siblings, 2 replies; 4+ messages in thread
From: Andy Lutomirski @ 2014-11-08  2:25 UTC (permalink / raw)
  To: kvm, linux-kernel, Paolo Bonzini, Gleb Natapov; +Cc: Andy Lutomirski

At least on Sandy Bridge, letting the CPU switch IA32_EFER is much
faster than switching it manually.

I benchmarked this using the vmexit kvm-unit-test (single run, but
GOAL multiplied by 5 to do more iterations):

Test                                  Before      After    Change
cpuid                                   2000       1932    -3.40%
vmcall                                  1914       1817    -5.07%
mov_from_cr8                              13         13     0.00%
mov_to_cr8                                19         19     0.00%
inl_from_pmtimer                       19164      10619   -44.59%
inl_from_qemu                          15662      10302   -34.22%
inl_from_kernel                         3916       3802    -2.91%
outl_to_kernel                          2230       2194    -1.61%
mov_dr                                   172        176     2.33%
ipi                                (skipped)  (skipped)
ipi+halt                           (skipped)  (skipped)
ple-round-robin                           13         13     0.00%
wr_tsc_adjust_msr                       1920       1845    -3.91%
rd_tsc_adjust_msr                       1892       1814    -4.12%
mmio-no-eventfd:pci-mem                16394      11165   -31.90%
mmio-wildcard-eventfd:pci-mem           4607       4645     0.82%
mmio-datamatch-eventfd:pci-mem          4601       4610     0.20%
portio-no-eventfd:pci-io               11507       7942   -30.98%
portio-wildcard-eventfd:pci-io          2239       2225    -0.63%
portio-datamatch-eventfd:pci-io         2250       2234    -0.71%

I haven't explicitly computed the significance of these numbers,
but this isn't subtle.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/kvm/vmx.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3e556c68351b..e72b9660e51c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1659,8 +1659,14 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 	vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
 
 	clear_atomic_switch_msr(vmx, MSR_EFER);
-	/* On ept, can't emulate nx, and must switch nx atomically */
-	if (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX)) {
+
+	/*
+	 * On EPT, we can't emulate NX, so we must switch EFER atomically.
+	 * On CPUs that support "load IA32_EFER", always switch EFER
+	 * atomically, since it's faster than switching it manually.
+	 */
+	if (cpu_has_load_ia32_efer ||
+	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
 		guest_efer = vmx->vcpu.arch.efer;
 		if (!(guest_efer & EFER_LMA))
 			guest_efer &= ~EFER_LME;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86, kvm, vmx: Always use LOAD_IA32_EFER if available
  2014-11-08  2:25 [PATCH] x86, kvm, vmx: Always use LOAD_IA32_EFER if available Andy Lutomirski
@ 2014-11-10  4:33 ` Wanpeng Li
  2014-11-10 19:31   ` Andy Lutomirski
  2014-11-12 11:38 ` Paolo Bonzini
  1 sibling, 1 reply; 4+ messages in thread
From: Wanpeng Li @ 2014-11-10  4:33 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: kvm, linux-kernel, Paolo Bonzini, Gleb Natapov

Hi Andy,
On Fri, Nov 07, 2014 at 06:25:18PM -0800, Andy Lutomirski wrote:
>At least on Sandy Bridge, letting the CPU switch IA32_EFER is much
>faster than switching it manually.
>
>I benchmarked this using the vmexit kvm-unit-test (single run, but
>GOAL multiplied by 5 to do more iterations):
>
>Test                                  Before      After    Change
>cpuid                                   2000       1932    -3.40%
>vmcall                                  1914       1817    -5.07%
>mov_from_cr8                              13         13     0.00%
>mov_to_cr8                                19         19     0.00%
>inl_from_pmtimer                       19164      10619   -44.59%
>inl_from_qemu                          15662      10302   -34.22%

What's the difference of IA32_EFER between guest and host in your config?

IIUC,
- NX is not consistent
  IA32_EFER will be auto load w/ and w/o the patch.
- SCE is not consistent 
  IA32_EFER will be switched through wrmsr(urn) w/o the patch, and auto load
  w/ the patch.

Regards,
Wanpeng Li 

>inl_from_kernel                         3916       3802    -2.91%
>outl_to_kernel                          2230       2194    -1.61%
>mov_dr                                   172        176     2.33%
>ipi                                (skipped)  (skipped)
>ipi+halt                           (skipped)  (skipped)
>ple-round-robin                           13         13     0.00%
>wr_tsc_adjust_msr                       1920       1845    -3.91%
>rd_tsc_adjust_msr                       1892       1814    -4.12%
>mmio-no-eventfd:pci-mem                16394      11165   -31.90%
>mmio-wildcard-eventfd:pci-mem           4607       4645     0.82%
>mmio-datamatch-eventfd:pci-mem          4601       4610     0.20%
>portio-no-eventfd:pci-io               11507       7942   -30.98%
>portio-wildcard-eventfd:pci-io          2239       2225    -0.63%
>portio-datamatch-eventfd:pci-io         2250       2234    -0.71%
>
>I haven't explicitly computed the significance of these numbers,
>but this isn't subtle.
>
>Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>---
> arch/x86/kvm/vmx.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>index 3e556c68351b..e72b9660e51c 100644
>--- a/arch/x86/kvm/vmx.c
>+++ b/arch/x86/kvm/vmx.c
>@@ -1659,8 +1659,14 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
> 	vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
> 
> 	clear_atomic_switch_msr(vmx, MSR_EFER);
>-	/* On ept, can't emulate nx, and must switch nx atomically */
>-	if (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX)) {
>+
>+	/*
>+	 * On EPT, we can't emulate NX, so we must switch EFER atomically.
>+	 * On CPUs that support "load IA32_EFER", always switch EFER
>+	 * atomically, since it's faster than switching it manually.
>+	 */
>+	if (cpu_has_load_ia32_efer ||
>+	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
> 		guest_efer = vmx->vcpu.arch.efer;
> 		if (!(guest_efer & EFER_LMA))
> 			guest_efer &= ~EFER_LME;
>-- 
>1.9.3
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86, kvm, vmx: Always use LOAD_IA32_EFER if available
  2014-11-10  4:33 ` Wanpeng Li
@ 2014-11-10 19:31   ` Andy Lutomirski
  0 siblings, 0 replies; 4+ messages in thread
From: Andy Lutomirski @ 2014-11-10 19:31 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: kvm list, linux-kernel@vger.kernel.org, Paolo Bonzini,
	Gleb Natapov

On Sun, Nov 9, 2014 at 8:33 PM, Wanpeng Li <wanpeng.li@linux.intel.com> wrote:
> Hi Andy,
> On Fri, Nov 07, 2014 at 06:25:18PM -0800, Andy Lutomirski wrote:
>>At least on Sandy Bridge, letting the CPU switch IA32_EFER is much
>>faster than switching it manually.
>>
>>I benchmarked this using the vmexit kvm-unit-test (single run, but
>>GOAL multiplied by 5 to do more iterations):
>>
>>Test                                  Before      After    Change
>>cpuid                                   2000       1932    -3.40%
>>vmcall                                  1914       1817    -5.07%
>>mov_from_cr8                              13         13     0.00%
>>mov_to_cr8                                19         19     0.00%
>>inl_from_pmtimer                       19164      10619   -44.59%
>>inl_from_qemu                          15662      10302   -34.22%
>
> What's the difference of IA32_EFER between guest and host in your config?
>
> IIUC,
> - NX is not consistent
>   IA32_EFER will be auto load w/ and w/o the patch.
> - SCE is not consistent
>   IA32_EFER will be switched through wrmsr(urn) w/o the patch, and auto load
>   w/ the patch.

This is with kvm-unit-test as is, so NX is consistent but SCE is different.

--Andy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86, kvm, vmx: Always use LOAD_IA32_EFER if available
  2014-11-08  2:25 [PATCH] x86, kvm, vmx: Always use LOAD_IA32_EFER if available Andy Lutomirski
  2014-11-10  4:33 ` Wanpeng Li
@ 2014-11-12 11:38 ` Paolo Bonzini
  1 sibling, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2014-11-12 11:38 UTC (permalink / raw)
  To: Andy Lutomirski, kvm, linux-kernel, Gleb Natapov



On 08/11/2014 03:25, Andy Lutomirski wrote:
> At least on Sandy Bridge, letting the CPU switch IA32_EFER is much
> faster than switching it manually.
> 
> I benchmarked this using the vmexit kvm-unit-test (single run, but
> GOAL multiplied by 5 to do more iterations):
> 
> Test                                  Before      After    Change
> cpuid                                   2000       1932    -3.40%
> vmcall                                  1914       1817    -5.07%
> mov_from_cr8                              13         13     0.00%
> mov_to_cr8                                19         19     0.00%
> inl_from_pmtimer                       19164      10619   -44.59%
> inl_from_qemu                          15662      10302   -34.22%
> inl_from_kernel                         3916       3802    -2.91%
> outl_to_kernel                          2230       2194    -1.61%
> mov_dr                                   172        176     2.33%
> ipi                                (skipped)  (skipped)
> ipi+halt                           (skipped)  (skipped)
> ple-round-robin                           13         13     0.00%
> wr_tsc_adjust_msr                       1920       1845    -3.91%
> rd_tsc_adjust_msr                       1892       1814    -4.12%
> mmio-no-eventfd:pci-mem                16394      11165   -31.90%
> mmio-wildcard-eventfd:pci-mem           4607       4645     0.82%
> mmio-datamatch-eventfd:pci-mem          4601       4610     0.20%
> portio-no-eventfd:pci-io               11507       7942   -30.98%
> portio-wildcard-eventfd:pci-io          2239       2225    -0.63%
> portio-datamatch-eventfd:pci-io         2250       2234    -0.71%
> 
> I haven't explicitly computed the significance of these numbers,
> but this isn't subtle.
> 
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
>  arch/x86/kvm/vmx.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 3e556c68351b..e72b9660e51c 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1659,8 +1659,14 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>  	vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
>  
>  	clear_atomic_switch_msr(vmx, MSR_EFER);
> -	/* On ept, can't emulate nx, and must switch nx atomically */
> -	if (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX)) {
> +
> +	/*
> +	 * On EPT, we can't emulate NX, so we must switch EFER atomically.
> +	 * On CPUs that support "load IA32_EFER", always switch EFER
> +	 * atomically, since it's faster than switching it manually.
> +	 */
> +	if (cpu_has_load_ia32_efer ||
> +	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
>  		guest_efer = vmx->vcpu.arch.efer;
>  		if (!(guest_efer & EFER_LMA))
>  			guest_efer &= ~EFER_LME;
> 

I am committing this patch, with an additional remark in the commit message:

 The results were reproducible on all of Nehalem, Sandy Bridge and
 Ivy Bridge.  The slowness of manual switching is because writing
 to EFER with WRMSR triggers a TLB flush, even if the only bit you're
 touching is SCE (so the page table format is not affected).  Doing
 the write as part of vmentry/vmexit, instead, does not flush the TLB,
 probably because all processors that have EPT also have VPID.

Paolo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-11-12 11:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-08  2:25 [PATCH] x86, kvm, vmx: Always use LOAD_IA32_EFER if available Andy Lutomirski
2014-11-10  4:33 ` Wanpeng Li
2014-11-10 19:31   ` Andy Lutomirski
2014-11-12 11:38 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).