From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Date: Sun, 25 Aug 2013 11:08:35 +0200 Message-ID: <5219C993.4040700@web.de> References: <1377369850-18583-1-git-send-email-root@Blade1-01.Blade1-01> <5219A7BA.8050602@web.de> <5219B590.7040304@web.de> <5219B825.5000806@web.de> <5219C61D.3080306@web.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="b3iMMtUA1pQvrddKb5Ja6imB7LsQSrsOG" Cc: Abel Gordon , Gleb Natapov , kvm , Paolo Bonzini To: Arthur Chunqi Li Return-path: Received: from mout.web.de ([212.227.15.4]:50928 "EHLO mout.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755257Ab3HYJIk (ORCPT ); Sun, 25 Aug 2013 05:08:40 -0400 Received: from mchn199C.mchp.siemens.de ([95.157.58.223]) by smtp.web.de (mrweb101) with ESMTPSA (Nemesis) id 0M9ojk-1VK95p3wxE-00B62Z for ; Sun, 25 Aug 2013 11:08:38 +0200 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --b3iMMtUA1pQvrddKb5Ja6imB7LsQSrsOG Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 2013-08-25 11:07, Arthur Chunqi Li wrote: > On Sun, Aug 25, 2013 at 4:53 PM, Jan Kiszka wrote: >> On 2013-08-25 10:41, Arthur Chunqi Li wrote: >>> On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon wrote= : >>>> >>>> >>>> kvm-owner@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: >>>> >>>>> From: Jan Kiszka >>>>> To: Abel Gordon/Haifa/IBM@IBMIL, >>>>> Cc: gleb@redhat.com, kvm , pbonzini@redhat.com= , >>>>> "=E6=9D=8E=E6=98=A5=E5=A5=87 " >>>>> Date: 25/08/2013 10:54 AM >>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preempt= ion >>>> timer >>>>> Sent by: kvm-owner@vger.kernel.org >>>>> >>>>> On 2013-08-25 09:50, Abel Gordon wrote: >>>>>> >>>>>> >>>>>> kvm-owner@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: >>>>>> >>>>>>> From: Jan Kiszka >>>>>>> To: Abel Gordon/Haifa/IBM@IBMIL, >>>>>>> Cc: gleb@redhat.com, kvm@vger.kernel.org, kvm-owner@vger.kernel.o= rg, >>>>>>> pbonzini@redhat.com, "=E6=9D=8E=E6=98=A5=E5=A5=87 " >>>>>>> Date: 25/08/2013 10:43 AM >>>>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preem= ption >>>>>> timer >>>>>>> Sent by: kvm-owner@vger.kernel.org >>>>>>> >>>>>>> On 2013-08-25 09:37, Abel Gordon wrote: >>>>>>>> >>>>>>>> >>>>>>>>> From: Jan Kiszka >>>>>>>>> To: "=E6=9D=8E=E6=98=A5=E5=A5=87 " , >>>>>>>>> Cc: kvm@vger.kernel.org, gleb@redhat.com, pbonzini@redhat.com >>>>>>>>> Date: 25/08/2013 09:44 AM >>>>>>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX >>>> preemption >>>>>>>> timer >>>>>>>>> Sent by: kvm-owner@vger.kernel.org >>>>>>>>> >>>>>>>>> On 2013-08-24 20:44, root wrote: >>>>>>>>>> This patch contains the following two changes: >>>>>>>>>> 1. Fix the bug in nested preemption timer support. If vmexit L= 2->L0 >>>>>>>>>> with some reasons not emulated by L1, preemption timer value s= hould >>>>>>>>>> be save in such exits. >>>>>>>>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit >>>> controls >>>>>>>>>> to nVMX. >>>>>>>>>> >>>>>>>>>> With this patch, nested VMX preemption timer features are full= y >>>>>>>>>> supported. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Arthur Chunqi Li >>>>>>>>>> --- >>>>>>>> >>>>>>>>>> >>>>>>>>>> @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_v= cpu >>>>>>>>> *vcpu, struct vmcs12 *vmcs12) >>>>>>>>>> (vmcs_config.pin_based_exec_ctrl | >>>>>>>>>> vmcs12->pin_based_vm_exec_control)); >>>>>>>>>> >>>>>>>>>> - if (vmcs12->pin_based_vm_exec_control & >>>>>>>> PIN_BASED_VMX_PREEMPTION_TIMER) >>>>>>>>>> - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, >>>>>>>>>> - vmcs12->vmx_preemption_timer_value); >>>>>>>>>> + if (vmcs12->pin_based_vm_exec_control & >>>>>>>>> PIN_BASED_VMX_PREEMPTION_TIMER) { >>>>>>>>>> + if (vmcs12->vm_exit_controls & >>>>>>>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) >>>>>>>>>> + vmcs12->vmx_preemption_timer_value =3D >>>>>>>>>> + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); >>>>>>>>>> + else >>>>>>>>>> + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, >>>>>>>>>> + vmcs12->vmx_preemption_timer_value); >>>>>>>>>> + } >>>>>>>>> >>>>>>>>> This is not correct. We still need to set the vmcs to >>>>>>>>> vmx_preemption_timer_value. The difference is that, on exit fro= m L2, >>>>>>>>> vmx_preemption_timer_value has to be updated according to the s= aved >>>>>>>>> hardware state. The corresponding code is missing in your patch= so >>>>>> far. >>>>>>>> >>>>>>>> I think something else maybe be missing here: assuming L0 handle= s >>>> exits >>>>>>>> for L2 without involving L1 (e.g. external interrupts or ept >>>>>> violations), >>>>>>>> then, we may spend some cycles in L0 handling these exits. Note = L1 is >>>>>> not >>>>>>>> aware of these exits and from L1 perspective L2 was running on t= he >>>> CPU. >>>>>>>> That means that we may need to reduce these cycles spent at >>>>>>>> L0 from the preemtion timer or emulate a preemption timer exit t= o >>>>>>>> force a transition to L1 instead of resuming L2. >>>>>>> >>>>>>> That's precisely what the logic I described should achieve: reloa= d the >>>>>>> value we saved on L2 exit on reentry. >>>>>> >>>>>> But don't you think we should also reduce the cycles spent at L0 f= rom >>>> the >>>>>> preemption timer ? I mean, if we spent X cycles at L0 handling a L= 2 >>>> exit >>>>>> which was not forwarded to L1, then, before we resume L2, >>>>>> the preemption timer should be: (previous_value_on_exit - X). >>>>>> If (previous_value_on_exit - X) < 0, then we should force ("emulat= e") a >>>>>> preemption timer exit between L2 and L1. >>>>> >>>>> We ask the hardware to save the value of the preemption on L2 exit.= This >>>>> value will be exposed to L1 (if it asked for saving as well) and/or= be >>>>> written back to the hardware on L2 reenty (unless L1 had a chance t= o run >>>>> and modified it). So the time spent in L0 is implicitly subtracted.= >>>> >>>> I think you are suggesting the following, please correct me if I am = wrong. >>>> 1) L1 resumes L2 with preemption timer enabled >>>> 2) L0 emulates the resume/launch >>>> 3) L2 runs for Y cycles until an external interrupt occurs (Y < pree= mption >>>> timer specified by L1) >>>> 4) L0 saved the preemption timer (original value - Y) >>>> 5) L0 spends X cycles handling the external interrupt >>>> 6) L0 resumes L2 with preemption timer =3D original value - Y >>>> >>>> Note that in this case "X is ignored". >>>> >>>> I was suggesting to do the following: >>>> 6) If original value - Y - X > 0 then >>>> L0 resumes L2 with preemption timer =3D original value - Y - X >>>> else >>>> L0 emulates a L2->L1 preemption timer exit (resumes L1) >>> Yes, your description is right. But I'm also thinking about my >>> previous consideration, why should we consider such X cycles as what >>> L2 spent. For nested VMX. external interrupt is not provided by L1, i= t >>> is triggered from L0 and want to cause periodically exit to L1, L2 is= >>> "accidentally injure" actually. Since these interrupts are not >>> generated from L1 and not attend to affect L2, these cycles should no= t >>> be treated as what L2 spent. Though these cycles are "spent" in view >>> of L1, but they should not be taken into consideration in nested VMX.= >>> >>> For another example, if vcpu scheduled out when L0 handing such >>> interrupts and CPU does some other things then schedule this vcpu >>> again, these cycles of executing other processes should not be treate= d >>> as what L2 spent definitely. >> >> Think of your preemption timer test case: There you are indirectly >> comparing the timer value against the TSC by checking the a preemption= >> timer exit happened after no more than n TSC cycles. But as the TSC L1= >> and L2 sees continued to tick while in L0, this test could now fail wh= en >> we leave out the L0 cycles. >> >> An alternative would be to hide all L0 TSC cycles from the guest. But >> that's not the way KVM works, independent of the preemption timer case= =2E >> >> BTW, you should use guest_read_tsc() on exit/entry of L2 in order to >> calculate the time spent in L0. This will ensure that potential tweaks= >> of TSC_OFFSET that L0 might have applied in the meantime will be taken= >> into account. > Well, in this case, these X cycles is actually not in L1 and L2, but > it is treated that L2 consumes them, which seems like these cycles are > "stolen". Yes, they are stolen by L0 from L2. Jan --b3iMMtUA1pQvrddKb5Ja6imB7LsQSrsOG Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIZyZQACgkQitSsb3rl5xSrNgCggFOqG1EKNtkf/4zDNcoJxULj 8xAAoIqkC1Ab2kj1szo4SpgMz+wKdf7j =iRvg -----END PGP SIGNATURE----- --b3iMMtUA1pQvrddKb5Ja6imB7LsQSrsOG--