Re: [PATCH] KVM: nVMX: Rework event injection and recovery

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kiszka <jan.kiszka@siemens.com>
To: "Nadav Har'El" <nyh@math.technion.ac.il>
Cc: Gleb Natapov <gleb@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>, kvm <kvm@vger.kernel.org>,
	"Nakajima, Jun" <jun.nakajima@intel.com>
Subject: Re: [PATCH] KVM: nVMX: Rework event injection and recovery
Date: Wed, 20 Feb 2013 15:37:51 +0100	[thread overview]
Message-ID: <5124DFBF.3070109@siemens.com> (raw)
In-Reply-To: <20130220141452.GA11902@fermat.math.technion.ac.il>

On 2013-02-20 15:14, Nadav Har'El wrote:
> Hi,
> 
> By the way, if you haven't seen my description of why the current code
> did what it did, take a look at
> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html
> Another description might also come in handy:
> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html
> 
> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery":
>> This aligns VMX more with SVM regarding event injection and recovery for
>> nested guests. The changes allow to inject interrupts directly from L0
>> to L2.
>>
>> One difference to SVM is that we always transfer the pending event
>> injection into the architectural state of the VCPU and then drop it from
>> there if it turns out that we left L2 to enter L1.
> 
> Last time I checked, if I'm remembering correctly, the nested SVM code did
> something a bit different: After the exit from L2 to L1 and unnecessarily
> queuing the pending interrupt for injection, it skipped one entry into L1,
> and as usual after the entry the interrupt queue is cleared so next time
> around, when L1 one is really entered, the wrong injection is not attempted.
> 
>> VMX and SVM are now identical in how they recover event injections from
>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD
>> still contains a valid event and, if yes, transfer the content into L1's
>> idt_vectoring_info_field.
> 
>> To avoid that we incorrectly leak an event into the architectural VCPU
>> state that L1 wants to inject, we skip cancellation on nested run.
> 
> I didn't understand this last point.

- prepare_vmcs02 sets event to be injected into L2
- while trying to enter L2, a cancel condition is met
- we call vmx_cancel_interrupts but should now avoid filling L1's event
  into the arch event queues - it's kept in vmcs12

> 
>> @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>  	vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
>>  	vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>>  
>> -	/* clear vm-entry fields which are to be cleared on exit */
>> -	if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
>> +	/* drop what we picked up for L0 via vmx_complete_interrupts */
>> +	vcpu->arch.nmi_injected = false;
>> +	kvm_clear_exception_queue(vcpu);
>> +	kvm_clear_interrupt_queue(vcpu);
> 
> It would be nice to move these lines out of prepare_vmcs12(), since they
> don't really do anything with vmcs12, and move it into
> nested_vmx_vmexit() (which is the one which called prepare_vmcs12()).

OK.

> 
> Did you test this both with PIN_BASED_EXT_INTR_MASK (the usual case) and
> !PIN_BASED_EXT_INTR_MASK (the case which interests you)? We need to make
> sure that in the former case, this doesn't clear the interrupt queue after
> we put an interrupt to be injected in it (at first glance it seems fine,
> but these code paths are so convoluted, it's hard to be sure).

I tested both, but none of my tests was close to cover all potential
corner cases. But that unconditional queue clearing surely deserves
attention and critical review.

> 
>> +	if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) &&
>> +	    vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) {
>> +		/*
>> +		 * Preserve the event that was supposed to be injected
>> +		 * by emulating it would have been returned in
>> +		 * IDT_VECTORING_INFO_FIELD.
>> +		 */
>> +		if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) &
>> +		    INTR_INFO_VALID_MASK) {
>> +			vmcs12->idt_vectoring_info_field =
>> +				vmcs12->vm_entry_intr_info_field;
>> +			vmcs12->idt_vectoring_error_code =
>> +				vmcs12->vm_entry_exception_error_code;
>> +			vmcs12->vm_exit_instruction_len =
>> +				vmcs12->vm_entry_instruction_len;
>> +			vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
> 
> I'm afraid I'm missing what you are trying to do here. Why would
> vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK ever be
> true? After all, the processor clears it after each sucessful exit so
> last if() will only succeed on failed entries - but this is NOT the
> case if we're in the enclosing if (note that vmcs12->vm_exit_reason  =
> vmcs_read32(VM_EXIT_REASON)). Maybe I'm missing something?

Canceled vmentry as indicated above. Look at vcpu_enter_guest:
kvm_mmu_reload may fail, or we need to handle some async event / perform
some reschedule. But those points are past prepare_vmcs02.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

next prev parent reply	other threads:[~2013-02-20 14:38 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-20 13:01 [PATCH] KVM: nVMX: Rework event injection and recovery Jan Kiszka
2013-02-20 14:14 ` Nadav Har'El
2013-02-20 14:37   ` Jan Kiszka [this message]
2013-02-20 17:01     ` Gleb Natapov
2013-02-20 17:24       ` Jan Kiszka
2013-02-20 17:50         ` Jan Kiszka
2013-02-21  9:22           ` Gleb Natapov
2013-02-21  9:43             ` Jan Kiszka
2013-02-21 10:06               ` Gleb Natapov
2013-02-21 10:18                 ` Jan Kiszka
2013-02-21 10:28                   ` Jan Kiszka
2013-02-21 10:33                     ` Jan Kiszka
2013-02-21 13:13                       ` Gleb Natapov
2013-02-21 13:22                         ` Jan Kiszka
2013-02-21 13:37                           ` Nadav Har'El
2013-02-21 13:45                             ` Gleb Natapov
2013-02-21 13:28                         ` Nadav Har'El
2013-02-20 14:53 ` Jan Kiszka
2013-02-20 15:30   ` Gleb Natapov
2013-02-20 15:51     ` Jan Kiszka
2013-02-20 15:57       ` Gleb Natapov
2013-02-20 16:00         ` Jan Kiszka
2013-02-20 16:46 ` Gleb Natapov
2013-02-20 16:48   ` Jan Kiszka
2013-02-20 16:51     ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5124DFBF.3070109@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=gleb@redhat.com \
    --cc=jun.nakajima@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nyh@math.technion.ac.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.