From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gleb Natapov Subject: Re: [PATCH] KVM: nVMX: Fix direct injection of interrupts from L0 to L2 Date: Tue, 19 Feb 2013 15:13:33 +0200 Message-ID: <20130219131333.GJ3600@redhat.com> References: <511FBD76.8010307@web.de> <20130217150721.GU9817@redhat.com> <5120F7CE.6050905@web.de> <20130217162617.GW9817@redhat.com> <51210CD1.3010208@web.de> <20130217173534.GB15961@redhat.com> <51234E11.7050801@web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Joerg Roedel , Marcelo Tosatti , kvm , "Nadav Har'El" , "Nakajima, Jun" , Alexander Graf To: Jan Kiszka Return-path: Received: from mx1.redhat.com ([209.132.183.28]:44393 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932741Ab3BSNO0 (ORCPT ); Tue, 19 Feb 2013 08:14:26 -0500 Content-Disposition: inline In-Reply-To: <51234E11.7050801@web.de> Sender: kvm-owner@vger.kernel.org List-ID: Copying Alex. He wrote nested SVM. On Tue, Feb 19, 2013 at 11:04:01AM +0100, Jan Kiszka wrote: > On 2013-02-17 18:35, Gleb Natapov wrote: > > On Sun, Feb 17, 2013 at 06:01:05PM +0100, Jan Kiszka wrote: > >> On 2013-02-17 17:26, Gleb Natapov wrote: > >>> On Sun, Feb 17, 2013 at 04:31:26PM +0100, Jan Kiszka wrote: > >>>> On 2013-02-17 16:07, Gleb Natapov wrote: > >>>>> On Sat, Feb 16, 2013 at 06:10:14PM +0100, Jan Kiszka wrote: > >>>>>> From: Jan Kiszka > >>>>>> > >>>>>> If L1 does not set PIN_BASED_EXT_INTR_MASK, we incorrectly skipped > >>>>>> vmx_complete_interrupts on L2 exits. This is required because, with > >>>>>> direct interrupt injection from L0 to L2, L0 has to update its pending > >>>>>> events. > >>>>>> > >>>>>> Also, we need to allow vmx_cancel_injection when entering L2 in we left > >>>>>> to L0. This condition is indirectly derived from the absence of valid > >>>>>> vectoring info in vmcs12. We no explicitly clear it if we find out that > >>>>>> the L2 exit is not targeting L1 but L0. > >>>>>> > >>>>> We really need to overhaul how interrupt injection is emulated in nested > >>>>> VMX. Why not put pending events into event queue instead of > >>>>> get_vmcs12(vcpu)->idt_vectoring_info_field and inject them in usual way. > >>>> > >>>> I was thinking about the same step but felt unsure so far if > >>>> vmx_complete_interrupts & Co. do not include any assumptions about the > >>>> vmcs configuration that won't match what L1 does. So I went for a > >>>> different path first, specifically to avoid impact on these hairy bits > >>>> for non-nested mode. > >>>> > >>> Assumption made by those functions should be still correct since guest > >>> VMCS configuration is not applied directly to real HW, but we should be > >>> careful of course. For instance interrupt queues should be cleared > >>> during nested vmexit and event transfered back to idt_vectoring_info_field. > >>> IIRC this is how nested SVM works BTW. > >> > >> Checking __vmx_complete_interrupts, the first issue I find is that type > >> 5 (privileged software exception) is not decoded, thus will be lost if > >> L2 leaves this way. That's a reason why it might be better to re-inject > >> the content of vmcs12 if it is valid. VMX is a bit more hairy than SVM, > >> I guess. > >> > > I do not see type 5 in SDM Table 24-15. We handle every type specified > > there. Why shouldn't we? SVM and VMX are pretty close in regards to > > event injection, this allowed us to move a lot of logic into the common > > code. > > I had a look at SVM to check how it deals with this, but I'm not sure > if I understand the logic correctly. SVM does: > > static int nested_svm_vmexit(struct vcpu_svm *svm) > { > ... > /* > * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have > * to make sure that we do not lose injected events. So check event_inj > * here and copy it to exit_int_info if it is valid. > * Exit_int_info and event_inj can't be both valid because the case > * below only happens on a VMRUN instruction intercept which has > * no valid exit_int_info set. > */ > if (vmcb->control.event_inj & SVM_EVTINJ_VALID) { > struct vmcb_control_area *nc = &nested_vmcb->control; > > nc->exit_int_info = vmcb->control.event_inj; > nc->exit_int_info_err = vmcb->control.event_inj_err; > } > > nested_svm_vmexit is only called when we leave L2 toward L1, right? So, > vmcb->control.event_inj might have been set on last VMRUN emulation, and > if that one failed, this value shall become the nested exit_int_info. So > far, so good. > > But what if that injection succeeded and we are now exiting L2 past the > execution of VMRUN, e.g. L1 intercepts the execution of some special > instruction in L2? Doesn't the nested exit_int_info now gain a stale > value? Or does the hardware clear the valid bit int EVENTINJ on > successful injection? Didn't find an indication in the spec on first > glance. I think it should. Otherwise, even without nested guest, event will be reinject on the next entry. > > Otherwise the logic seems to be like this: > - EVENTINJ is set to the nested value on VMRUN emulation, and only > there (that's in contrast to current VMX, but it makes sense) > - Interrupt completion with state transfer the VCPU event queues is > *only* performed on L2-to-L1 exits (that's like VMX is trying to do > it as well) > - There is a special case around nested.exit_required that I didn't > fully get yet, nor can I say how it corresponds to logic in VMX. > > Jan > -- Gleb.