From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jan Kiszka <jan.kiszka@web.de>
Subject: Re: [PATCH] KVM: nVMX: Fix direct injection of interrupts from L0
 to L2
Date: Tue, 19 Feb 2013 11:04:01 +0100
Message-ID: <51234E11.7050801@web.de>
References: <511FBD76.8010307@web.de> <20130217150721.GU9817@redhat.com> <5120F7CE.6050905@web.de> <20130217162617.GW9817@redhat.com> <51210CD1.3010208@web.de> <20130217173534.GB15961@redhat.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature";
 boundary="----enig2ICJXVFETHQKUATCESPNS"
Cc: Marcelo Tosatti <mtosatti@redhat.com>, kvm <kvm@vger.kernel.org>,
	Nadav Har'El <nyh@math.technion.ac.il>,
	"Nakajima, Jun" <jun.nakajima@intel.com>
To: Gleb Natapov <gleb@redhat.com>, Joerg Roedel <joro@8bytes.org>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mout.web.de ([212.227.15.4]:62866 "EHLO mout.web.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1758340Ab3BSKEa (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 19 Feb 2013 05:04:30 -0500
In-Reply-To: <20130217173534.GB15961@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
------enig2ICJXVFETHQKUATCESPNS
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 2013-02-17 18:35, Gleb Natapov wrote:
> On Sun, Feb 17, 2013 at 06:01:05PM +0100, Jan Kiszka wrote:
>> On 2013-02-17 17:26, Gleb Natapov wrote:
>>> On Sun, Feb 17, 2013 at 04:31:26PM +0100, Jan Kiszka wrote:
>>>> On 2013-02-17 16:07, Gleb Natapov wrote:
>>>>> On Sat, Feb 16, 2013 at 06:10:14PM +0100, Jan Kiszka wrote:
>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>
>>>>>> If L1 does not set PIN_BASED_EXT_INTR_MASK, we incorrectly skipped=

>>>>>> vmx_complete_interrupts on L2 exits. This is required because, wit=
h
>>>>>> direct interrupt injection from L0 to L2, L0 has to update its pen=
ding
>>>>>> events.
>>>>>>
>>>>>> Also, we need to allow vmx_cancel_injection when entering L2 in we=
 left
>>>>>> to L0. This condition is indirectly derived from the absence of va=
lid
>>>>>> vectoring info in vmcs12. We no explicitly clear it if we find out=
 that
>>>>>> the L2 exit is not targeting L1 but L0.
>>>>>>
>>>>> We really need to overhaul how interrupt injection is emulated in n=
ested
>>>>> VMX. Why not put pending events into event queue instead of
>>>>> get_vmcs12(vcpu)->idt_vectoring_info_field and inject them in usual=
 way.
>>>>
>>>> I was thinking about the same step but felt unsure so far if
>>>> vmx_complete_interrupts & Co. do not include any assumptions about t=
he
>>>> vmcs configuration that won't match what L1 does. So I went for a
>>>> different path first, specifically to avoid impact on these hairy bi=
ts
>>>> for non-nested mode.
>>>>
>>> Assumption made by those functions should be still correct since gues=
t
>>> VMCS configuration is not applied directly to real HW, but we should =
be
>>> careful of course. For instance interrupt queues should be cleared
>>> during nested vmexit and event transfered back to idt_vectoring_info_=
field.
>>> IIRC this is how nested SVM works BTW.
>>
>> Checking __vmx_complete_interrupts, the first issue I find is that typ=
e
>> 5 (privileged software exception) is not decoded, thus will be lost if=

>> L2 leaves this way. That's a reason why it might be better to re-injec=
t
>> the content of vmcs12 if it is valid. VMX is a bit more hairy than SVM=
,
>> I guess.
>>
> I do not see type 5 in SDM Table 24-15. We handle every type specified
> there. Why shouldn't we? SVM and VMX are pretty close in regards to
> event injection, this allowed us to move a lot of logic into the common=

> code.

I had a look at SVM to check how it deals with this, but I'm not sure
if I understand the logic correctly. SVM does:

static int nested_svm_vmexit(struct vcpu_svm *svm)
{
	...
	/*
	 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
	 * to make sure that we do not lose injected events. So check event_inj
	 * here and copy it to exit_int_info if it is valid.
	 * Exit_int_info and event_inj can't be both valid because the case
	 * below only happens on a VMRUN instruction intercept which has
	 * no valid exit_int_info set.
	 */
	if (vmcb->control.event_inj & SVM_EVTINJ_VALID) {
		struct vmcb_control_area *nc =3D &nested_vmcb->control;

		nc->exit_int_info     =3D vmcb->control.event_inj;
		nc->exit_int_info_err =3D vmcb->control.event_inj_err;
	}

nested_svm_vmexit is only called when we leave L2 toward L1, right? So,
vmcb->control.event_inj might have been set on last VMRUN emulation, and
if that one failed, this value shall become the nested exit_int_info. So
far, so good.

But what if that injection succeeded and we are now exiting L2 past the
execution of VMRUN, e.g. L1 intercepts the execution of some special
instruction in L2? Doesn't the nested exit_int_info now gain a stale
value? Or does the hardware clear the valid bit int EVENTINJ on
successful injection? Didn't find an indication in the spec on first
glance.

Otherwise the logic seems to be like this:
 - EVENTINJ is set to the nested value on VMRUN emulation, and only
   there (that's in contrast to current VMX, but it makes sense)
 - Interrupt completion with state transfer the VCPU event queues is
   *only* performed on L2-to-L1 exits (that's like VMX is trying to do
   it as well)
 - There is a special case around nested.exit_required that I didn't
   fully get yet, nor can I say how it corresponds to logic in VMX.

Jan


------enig2ICJXVFETHQKUATCESPNS
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlEjTh0ACgkQitSsb3rl5xSJWwCgyY/neEndQ4B6lXLmpZaR5KBF
M0kAoLFSal0vq3vclKUcN4moku6hXcAV
=+R0F
-----END PGP SIGNATURE-----

------enig2ICJXVFETHQKUATCESPNS--