From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer Date: Thu, 10 Oct 2013 18:20:43 +0200 Message-ID: <5256D3DB.7090107@redhat.com> References: <1379319104-10266-1-git-send-email-yzt356@gmail.com> <52444CF6.1020102@redhat.com> <52493F8C.6040009@web.de> <524C6A30.9090403@web.de> <5256D1F0.7000905@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Arthur Chunqi Li , kvm@vger.kernel.org, gleb@redhat.com, "Zhang, Yang Z" To: Jan Kiszka Return-path: Received: from mx1.redhat.com ([209.132.183.28]:21689 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752665Ab3JJQUz (ORCPT ); Thu, 10 Oct 2013 12:20:55 -0400 In-Reply-To: <5256D1F0.7000905@siemens.com> Sender: kvm-owner@vger.kernel.org List-ID: Il 10/10/2013 18:12, Jan Kiszka ha scritto: > On 2013-10-02 20:47, Jan Kiszka wrote: >> On 2013-09-30 11:08, Jan Kiszka wrote: >>> On 2013-09-26 17:04, Paolo Bonzini wrote: >>>> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: >>>>> This patch contains the following two changes: >>>>> 1. Fix the bug in nested preemption timer support. If vmexit L2->= L0 >>>>> with some reasons not emulated by L1, preemption timer value shou= ld >>>>> be save in such exits. >>>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit contr= ols >>>>> to nVMX. >>>>> >>>>> With this patch, nested VMX preemption timer features are fully >>>>> supported. >>>>> >>>>> Signed-off-by: Arthur Chunqi Li >>>>> --- >>>>> ChangeLog to v4: >>>>> Format changes and remove a flag in nested_vmx. >>>>> arch/x86/include/uapi/asm/msr-index.h | 1 + >>>>> arch/x86/kvm/vmx.c | 44 +++++++++++++++++++= ++++++++++++-- >>>>> 2 files changed, 43 insertions(+), 2 deletions(-) >>>> >>>> Hi all, >>>> >>>> the test fails for me if the preemption timer value is set to a va= lue >>>> that is above ~2000 (which means ~65000 TSC cycles on this machine= ). >>>> The preemption timer seems to count faster than what is expected, = for >>>> example only up to 4 million cycles if you set it to one million. >>>> So, I am leaving the patch out of kvm/queue for now, until I can >>>> test it on more processors. >>> >>> I've done some measurements with the help of ftrace on the time it = takes >>> to let the preemption timer trigger (no adjustments via Arthur's pa= tch >>> were involved): On my Core i7-620M, the preemption timer seems to t= ick >>> almost 10 times faster than spec and scale value (5) suggests. I've >>> loaded a value of 100000, and it took about 130 =B5s until I got a = vmexit >>> with reason PREEMPTION_TIMER (no other exists in between). >>> >>> qemu-system-x86-13765 [003] 298562.966079: bprint: p= repare_vmcs02: preempt val 100000 >>> qemu-system-x86-13765 [003] 298562.966083: kvm_entry: v= cpu 0 >>> qemu-system-x86-13765 [003] 298562.966212: kvm_exit: r= eason PREEMPTION_TIMER rip 0x401fea info 0 0 >>> >>> That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 = MHz * >>> 2^5 is 24.6 GHz. I've read the spec several times, but it seems pre= tty >>> clear on this. It just doesn't match reality. Very strange. >> >> ...but documented: I found an related errata for my processor (AAT59= ) >> and also for Xeon 5500 (AAK139). At least current Haswell generation= is >> no affected. I can test the patch on a Haswell board I have at work >> later this week. >=20 > To complete this story: Arthur's patch works fine on a non-broken CPU > (here: i7-4770S). >=20 > Arthur, find some fix-ups for your test case below. It avoids printin= g > from within L2 as this could deadlock when the timer fires and L1 the= n > tries to print something. Also, it disables the preemption timer on > leave so that it cannot fire later on again. If you want to fold this > into your patch, feel free. Otherwise I can post a separate patch on > top. Is that a Signed-off-by? :) BTW, VirtualBox has a test for this erratum. It would be nice to skip = the test when the processor is found to be buggy. I'll put Arthur's patch back. Thanks for testing! Paolo static bool hmR0InitIntelIsSubjectToVmxPreemptionTimerErratum(void) { uint32_t u =3D ASMCpuId_EAX(1); u &=3D ~(RT_BIT_32(14) | RT_BIT_32(15) | RT_BIT_32(28) | RT_BIT_32(= 29) | RT_BIT_32(30) | RT_BIT_32(31)); if ( u =3D=3D UINT32_C(0x000206E6) /* 323344.pdf - BA86 - D0 - = Intel Xeon Processor 7500 Series */ || u =3D=3D UINT32_C(0x00020652) /* 323056.pdf - AAX65 - C2 - = Intel Xeon Processor L3406 */ || u =3D=3D UINT32_C(0x00020652) /* 322814.pdf - AAT59 - C2 - = Intel CoreTM i7-600, i5-500, i5-400 and i3-300 Mobile Processor Series = */ || u =3D=3D UINT32_C(0x00020652) /* 322911.pdf - AAU65 - C2 - = Intel CoreTM i5-600, i3-500 Desktop Processor Series and Intel Pentium = Processor G6950 */ || u =3D=3D UINT32_C(0x00020655) /* 322911.pdf - AAU65 - K0 - = Intel CoreTM i5-600, i3-500 Desktop Processor Series and Intel Pentium = Processor G6950 */ || u =3D=3D UINT32_C(0x000106E5) /* 322373.pdf - AAO95 - B1 - = Intel Xeon Processor 3400 Series */ || u =3D=3D UINT32_C(0x000106E5) /* 322166.pdf - AAN92 - B1 - = Intel CoreTM i7-800 and i5-700 Desktop Processor Series */ || u =3D=3D UINT32_C(0x000106E5) /* 320767.pdf - AAP86 - B1 - = Intel Core i7-900 Mobile Processor Extreme Edition Series, Intel Core i= 7-800 and i7-700 Mobile Processor Series */ || u =3D=3D UINT32_C(0x000106A0) /*?321333.pdf - AAM126 - C0 - = Intel Xeon Processor 3500 Series Specification */ || u =3D=3D UINT32_C(0x000106A1) /*?321333.pdf - AAM126 - C1 - = Intel Xeon Processor 3500 Series Specification */ || u =3D=3D UINT32_C(0x000106A4) /* 320836.pdf - AAJ124 - C0 - = Intel Core i7-900 Desktop Processor Extreme Edition Series and Intel Co= re i7-900 Desktop Processor Series */ || u =3D=3D UINT32_C(0x000106A5) /* 321333.pdf - AAM126 - D0 - = Intel Xeon Processor 3500 Series Specification */ || u =3D=3D UINT32_C(0x000106A5) /* 321324.pdf - AAK139 - D0 - = Intel Xeon Processor 5500 Series Specification */ || u =3D=3D UINT32_C(0x000106A5) /* 320836.pdf - AAJ124 - D0 - = Intel Core i7-900 Desktop Processor Extreme Edition Series and Intel Co= re i7-900 Desktop Processor Series */ ) return true; return false; } > Jan >=20 > diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c > index 4372878..66a4201 100644 > --- a/x86/vmx_tests.c > +++ b/x86/vmx_tests.c > @@ -141,6 +141,9 @@ void preemption_timer_init() > preempt_val =3D 10000000; > vmcs_write(PREEMPT_TIMER_VALUE, preempt_val); > preempt_scale =3D rdmsr(MSR_IA32_VMX_MISC) & 0x1F; > + > + if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT)) > + printf("\tSave preemption value is not supported\n"); > } > =20 > void preemption_timer_main() > @@ -150,9 +153,7 @@ void preemption_timer_main() > printf("\tPreemption timer is not supported\n"); > return; > } > - if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT)) > - printf("\tSave preemption value is not supported\n"); > - else { > + if (ctrl_exit_rev.clr & EXI_SAVE_PREEMPT) { > set_stage(0); > vmcall(); > if (get_stage() =3D=3D 1) > @@ -161,8 +162,8 @@ void preemption_timer_main() > while (1) { > if (((rdtsc() - tsc_val) >> preempt_scale) > > 10 * preempt_val) { > - report("Preemption timer", 0); > - break; > + set_stage(2); > + vmcall(); > } > } > } > @@ -183,7 +184,7 @@ int preemption_timer_exit_handler() > report("Preemption timer", 0); > else > report("Preemption timer", 1); > - return VMX_TEST_VMEXIT; > + break; > case VMX_VMCALL: > switch (get_stage()) { > case 0: > @@ -195,24 +196,29 @@ int preemption_timer_exit_handler() > EXI_SAVE_PREEMPT) & ctrl_exit_rev.clr; > vmcs_write(EXI_CONTROLS, ctrl_exit); > } > - break; > + vmcs_write(GUEST_RIP, guest_rip + insn_len); > + return VMX_TEST_RESUME; > case 1: > if (vmcs_read(PREEMPT_TIMER_VALUE) >=3D preempt_val) > report("Save preemption value", 0); > else > report("Save preemption value", 1); > + vmcs_write(GUEST_RIP, guest_rip + insn_len); > + return VMX_TEST_RESUME; > + case 2: > + report("Preemption timer", 0); > break; > default: > printf("Invalid stage.\n"); > print_vmexit_info(); > - return VMX_TEST_VMEXIT; > + break; > } > - vmcs_write(GUEST_RIP, guest_rip + insn_len); > - return VMX_TEST_RESUME; > + break; > default: > printf("Unknown exit reason, %d\n", reason); > print_vmexit_info(); > } > + vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS) & ~PIN_PREEMPT); > return VMX_TEST_VMEXIT; > } > =20 >=20