From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Date: Sun, 25 Aug 2013 10:53:49 +0200 Message-ID: <5219C61D.3080306@web.de> References: <1377369850-18583-1-git-send-email-root@Blade1-01.Blade1-01> <5219A7BA.8050602@web.de> <5219B590.7040304@web.de> <5219B825.5000806@web.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="fDsfqBP5lfs8e3lrPigWIH0LcHSkh34A2" Cc: Abel Gordon , Gleb Natapov , kvm , Paolo Bonzini To: Arthur Chunqi Li Return-path: Received: from mout.web.de ([212.227.15.14]:53696 "EHLO mout.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756129Ab3HYIxy (ORCPT ); Sun, 25 Aug 2013 04:53:54 -0400 Received: from mchn199C.mchp.siemens.de ([95.157.58.223]) by smtp.web.de (mrweb102) with ESMTPSA (Nemesis) id 0McWnE-1VV2xs0SO0-00HgqL for ; Sun, 25 Aug 2013 10:53:52 +0200 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --fDsfqBP5lfs8e3lrPigWIH0LcHSkh34A2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 2013-08-25 10:41, Arthur Chunqi Li wrote: > On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon wrote: >> >> >> kvm-owner@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: >> >>> From: Jan Kiszka >>> To: Abel Gordon/Haifa/IBM@IBMIL, >>> Cc: gleb@redhat.com, kvm , pbonzini@redhat.com, >>> "=E6=9D=8E=E6=98=A5=E5=A5=87 " >>> Date: 25/08/2013 10:54 AM >>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemptio= n >> timer >>> Sent by: kvm-owner@vger.kernel.org >>> >>> On 2013-08-25 09:50, Abel Gordon wrote: >>>> >>>> >>>> kvm-owner@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: >>>> >>>>> From: Jan Kiszka >>>>> To: Abel Gordon/Haifa/IBM@IBMIL, >>>>> Cc: gleb@redhat.com, kvm@vger.kernel.org, kvm-owner@vger.kernel.org= , >>>>> pbonzini@redhat.com, "=E6=9D=8E=E6=98=A5=E5=A5=87 " >>>>> Date: 25/08/2013 10:43 AM >>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preempt= ion >>>> timer >>>>> Sent by: kvm-owner@vger.kernel.org >>>>> >>>>> On 2013-08-25 09:37, Abel Gordon wrote: >>>>>> >>>>>> >>>>>>> From: Jan Kiszka >>>>>>> To: "=E6=9D=8E=E6=98=A5=E5=A5=87 " , >>>>>>> Cc: kvm@vger.kernel.org, gleb@redhat.com, pbonzini@redhat.com >>>>>>> Date: 25/08/2013 09:44 AM >>>>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX >> preemption >>>>>> timer >>>>>>> Sent by: kvm-owner@vger.kernel.org >>>>>>> >>>>>>> On 2013-08-24 20:44, root wrote: >>>>>>>> This patch contains the following two changes: >>>>>>>> 1. Fix the bug in nested preemption timer support. If vmexit L2-= >L0 >>>>>>>> with some reasons not emulated by L1, preemption timer value sho= uld >>>>>>>> be save in such exits. >>>>>>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit >> controls >>>>>>>> to nVMX. >>>>>>>> >>>>>>>> With this patch, nested VMX preemption timer features are fully >>>>>>>> supported. >>>>>>>> >>>>>>>> Signed-off-by: Arthur Chunqi Li >>>>>>>> --- >>>>>> >>>>>>>> >>>>>>>> @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcp= u >>>>>>> *vcpu, struct vmcs12 *vmcs12) >>>>>>>> (vmcs_config.pin_based_exec_ctrl | >>>>>>>> vmcs12->pin_based_vm_exec_control)); >>>>>>>> >>>>>>>> - if (vmcs12->pin_based_vm_exec_control & >>>>>> PIN_BASED_VMX_PREEMPTION_TIMER) >>>>>>>> - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, >>>>>>>> - vmcs12->vmx_preemption_timer_value); >>>>>>>> + if (vmcs12->pin_based_vm_exec_control & >>>>>>> PIN_BASED_VMX_PREEMPTION_TIMER) { >>>>>>>> + if (vmcs12->vm_exit_controls & >>>>>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) >>>>>>>> + vmcs12->vmx_preemption_timer_value =3D >>>>>>>> + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); >>>>>>>> + else >>>>>>>> + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, >>>>>>>> + vmcs12->vmx_preemption_timer_value); >>>>>>>> + } >>>>>>> >>>>>>> This is not correct. We still need to set the vmcs to >>>>>>> vmx_preemption_timer_value. The difference is that, on exit from = L2, >>>>>>> vmx_preemption_timer_value has to be updated according to the sav= ed >>>>>>> hardware state. The corresponding code is missing in your patch s= o >>>> far. >>>>>> >>>>>> I think something else maybe be missing here: assuming L0 handles >> exits >>>>>> for L2 without involving L1 (e.g. external interrupts or ept >>>> violations), >>>>>> then, we may spend some cycles in L0 handling these exits. Note L1= is >>>> not >>>>>> aware of these exits and from L1 perspective L2 was running on the= >> CPU. >>>>>> That means that we may need to reduce these cycles spent at >>>>>> L0 from the preemtion timer or emulate a preemption timer exit to >>>>>> force a transition to L1 instead of resuming L2. >>>>> >>>>> That's precisely what the logic I described should achieve: reload = the >>>>> value we saved on L2 exit on reentry. >>>> >>>> But don't you think we should also reduce the cycles spent at L0 fro= m >> the >>>> preemption timer ? I mean, if we spent X cycles at L0 handling a L2 >> exit >>>> which was not forwarded to L1, then, before we resume L2, >>>> the preemption timer should be: (previous_value_on_exit - X). >>>> If (previous_value_on_exit - X) < 0, then we should force ("emulate"= ) a >>>> preemption timer exit between L2 and L1. >>> >>> We ask the hardware to save the value of the preemption on L2 exit. T= his >>> value will be exposed to L1 (if it asked for saving as well) and/or b= e >>> written back to the hardware on L2 reenty (unless L1 had a chance to = run >>> and modified it). So the time spent in L0 is implicitly subtracted. >> >> I think you are suggesting the following, please correct me if I am wr= ong. >> 1) L1 resumes L2 with preemption timer enabled >> 2) L0 emulates the resume/launch >> 3) L2 runs for Y cycles until an external interrupt occurs (Y < preemp= tion >> timer specified by L1) >> 4) L0 saved the preemption timer (original value - Y) >> 5) L0 spends X cycles handling the external interrupt >> 6) L0 resumes L2 with preemption timer =3D original value - Y >> >> Note that in this case "X is ignored". >> >> I was suggesting to do the following: >> 6) If original value - Y - X > 0 then >> L0 resumes L2 with preemption timer =3D original value - Y - X >> else >> L0 emulates a L2->L1 preemption timer exit (resumes L1) > Yes, your description is right. But I'm also thinking about my > previous consideration, why should we consider such X cycles as what > L2 spent. For nested VMX. external interrupt is not provided by L1, it > is triggered from L0 and want to cause periodically exit to L1, L2 is > "accidentally injure" actually. Since these interrupts are not > generated from L1 and not attend to affect L2, these cycles should not > be treated as what L2 spent. Though these cycles are "spent" in view > of L1, but they should not be taken into consideration in nested VMX. >=20 > For another example, if vcpu scheduled out when L0 handing such > interrupts and CPU does some other things then schedule this vcpu > again, these cycles of executing other processes should not be treated > as what L2 spent definitely. Think of your preemption timer test case: There you are indirectly comparing the timer value against the TSC by checking the a preemption timer exit happened after no more than n TSC cycles. But as the TSC L1 and L2 sees continued to tick while in L0, this test could now fail when we leave out the L0 cycles. An alternative would be to hide all L0 TSC cycles from the guest. But that's not the way KVM works, independent of the preemption timer case. BTW, you should use guest_read_tsc() on exit/entry of L2 in order to calculate the time spent in L0. This will ensure that potential tweaks of TSC_OFFSET that L0 might have applied in the meantime will be taken into account. Jan --fDsfqBP5lfs8e3lrPigWIH0LcHSkh34A2 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIZxh4ACgkQitSsb3rl5xRAEQCeJEarDQJmwlsHGXOK+KuqvnMx 11oAn2sOcOOCfvTtOjA7V8h9ep1pO9An =W3oa -----END PGP SIGNATURE----- --fDsfqBP5lfs8e3lrPigWIH0LcHSkh34A2--