Re: [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kiszka <jan.kiszka@web.de>
To: Gleb Natapov <gleb@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>, kvm <kvm@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Nadav Har'El <nyh@math.technion.ac.il>
Subject: Re: [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery
Date: Sun, 17 Mar 2013 16:02:07 +0100	[thread overview]
Message-ID: <5145DAEF.6000400@web.de> (raw)
In-Reply-To: <20130317134504.GI11223@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 9216 bytes --]

On 2013-03-17 14:45, Gleb Natapov wrote:
> On Sat, Mar 16, 2013 at 11:23:16AM +0100, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> The basic idea is to always transfer the pending event injection on
>> vmexit into the architectural state of the VCPU and then drop it from
>> there if it turns out that we left L2 to enter L1.
>>
>> VMX and SVM are now identical in how they recover event injections from
>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD
>> still contains a valid event and, if yes, transfer the content into L1's
>> idt_vectoring_info_field.
>>
> But how this can happens with VMX code? VMX has this nested_run_pending
> thing that prevents #vmexit emulation from happening without vmlaunch.
> This means that VM_ENTRY_INTR_INFO_FIELD should never be valid during
> #vmexit emulation since it is marked invalid during vmlaunch.

Now that nmi/interrupt_allowed is strict /wrt nested_run_pending again,
it may indeed no longer happen. It was definitely a problem before, also
with direct vmexit on pending INIT. Requires a second thought, maybe
also a WARN_ON(vmx->nested.nested_run_pending) in nested_vmx_vmexit.

> 
>> However, we differ on how to deal with events that L0 wanted to inject
>> into L2. Likely, this case is still broken in SVM. For VMX, the function
>> vmcs12_save_pending_events deals with transferring pending L0 events
>> into the queue of L1. That is mandatory as L1 may decide to switch the
>> guest state completely, invalidating or preserving the pending events
>> for later injection (including on a different node, once we support
>> migration).
>>
>> Note that we treat directly injected NMIs differently as they can hit
>> both L1 and L2. In this case, we let L0 try to injection again also over
>> L1 after leaving L2.
>>
> Hmm, where SDM says NMI behaves this way?

NMIs are only blocked in root mode if we took an NMI-related vmexit (or,
of course, an NMI is being processed). Thus, every arriving NMI can
either hit the guest or the host - pure luck.

However, I have missed the fact that an NMI may have been injected from
L1 as well. If injection triggers a vmexit, that NMI could now leak into
L1. So we have to process them as well in vmcs12_save_pending_events.

> 
>> To avoid that we incorrectly leak an event into the architectural VCPU
>> state that L1 wants to inject, we skip cancellation on nested run.
>>
> How the leak can happen?

See above, this likely no longer applies.

> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  arch/x86/kvm/vmx.c |  118 ++++++++++++++++++++++++++++++++++++++--------------
>>  1 files changed, 87 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 126d047..ca74358 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -6492,8 +6492,6 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
>>  
>>  static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
>>  {
>> -	if (is_guest_mode(&vmx->vcpu))
>> -		return;
>>  	__vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info,
>>  				  VM_EXIT_INSTRUCTION_LEN,
>>  				  IDT_VECTORING_ERROR_CODE);
>> @@ -6501,7 +6499,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
>>  
>>  static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
>>  {
>> -	if (is_guest_mode(vcpu))
>> +	if (to_vmx(vcpu)->nested.nested_run_pending)
>>  		return;
>>  	__vmx_complete_interrupts(vcpu,
>>  				  vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
>> @@ -6534,21 +6532,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>  	unsigned long debugctlmsr;
>>  
>> -	if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) {
>> -		struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>> -		if (vmcs12->idt_vectoring_info_field &
>> -				VECTORING_INFO_VALID_MASK) {
>> -			vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
>> -				vmcs12->idt_vectoring_info_field);
>> -			vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
>> -				vmcs12->vm_exit_instruction_len);
>> -			if (vmcs12->idt_vectoring_info_field &
>> -					VECTORING_INFO_DELIVER_CODE_MASK)
>> -				vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
>> -					vmcs12->idt_vectoring_error_code);
>> -		}
>> -	}
>> -
>>  	/* Record the guest's net vcpu time for enforced NMI injections. */
>>  	if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
>>  		vmx->entry_time = ktime_get();
>> @@ -6707,17 +6690,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>  
>>  	vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>>  
>> -	if (is_guest_mode(vcpu)) {
>> -		struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>> -		vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info;
>> -		if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) {
>> -			vmcs12->idt_vectoring_error_code =
>> -				vmcs_read32(IDT_VECTORING_ERROR_CODE);
>> -			vmcs12->vm_exit_instruction_len =
>> -				vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
>> -		}
>> -	}
>> -
>>  	vmx->loaded_vmcs->launched = 1;
>>  
>>  	vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
>> @@ -7324,6 +7296,52 @@ vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>  			vcpu->arch.cr4_guest_owned_bits));
>>  }
>>  
>> +static void vmcs12_save_pending_events(struct kvm_vcpu *vcpu,
>> +				       struct vmcs12 *vmcs12)
>> +{
>> +	u32 idt_vectoring;
>> +	unsigned int nr;
>> +
>> +	/*
>> +	 * We only transfer exceptions and maskable interrupts. It is fine if
>> +	 * L0 retries to inject a pending NMI over L1.
>> +	 */
>> +	if (vcpu->arch.exception.pending) {
>> +		nr = vcpu->arch.exception.nr;
>> +		idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
>> +
>> +		if (kvm_exception_is_soft(nr)) {
>> +			vmcs12->vm_exit_instruction_len =
>> +				vcpu->arch.event_exit_inst_len;
>> +			idt_vectoring |= INTR_TYPE_SOFT_EXCEPTION;
>> +		} else
>> +			idt_vectoring |= INTR_TYPE_HARD_EXCEPTION;
>> +
>> +		if (vcpu->arch.exception.has_error_code) {
>> +			idt_vectoring |= VECTORING_INFO_DELIVER_CODE_MASK;
>> +			vmcs12->idt_vectoring_error_code =
>> +				vcpu->arch.exception.error_code;
>> +		}
>> +
>> +		vmcs12->idt_vectoring_info_field = idt_vectoring;
>> +	} else if (vcpu->arch.interrupt.pending) {
>> +		nr = vcpu->arch.interrupt.nr;
>> +		idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
>> +
>> +		if (vcpu->arch.interrupt.soft) {
>> +			idt_vectoring |= INTR_TYPE_SOFT_INTR;
>> +			vmcs12->vm_entry_instruction_len =
>> +				vcpu->arch.event_exit_inst_len;
>> +		} else
>> +			idt_vectoring |= INTR_TYPE_EXT_INTR;
>> +
>> +		vmcs12->idt_vectoring_info_field = idt_vectoring;
>> +	}
>> +
>> +	kvm_clear_exception_queue(vcpu);
>> +	kvm_clear_interrupt_queue(vcpu);
>> +}
>> +
>>  /*
>>   * prepare_vmcs12 is part of what we need to do when the nested L2 guest exits
>>   * and we want to prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12),
>> @@ -7415,9 +7433,47 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>  	vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
>>  	vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>>  
>> -	/* clear vm-entry fields which are to be cleared on exit */
>> -	if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
>> +	if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
>> +		if ((vmcs12->vm_entry_intr_info_field &
>> +		     INTR_INFO_VALID_MASK) &&
>> +		    (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) &
>> +		     INTR_INFO_VALID_MASK)) {
> Again I do not see how this condition can be true.
> 
>> +			/*
>> +			 * Preserve the event that was supposed to be injected
>> +			 * by L1 via emulating it would have been returned in
>> +			 * IDT_VECTORING_INFO_FIELD.
>> +			 */
>> +			vmcs12->idt_vectoring_info_field =
>> +				vmcs12->vm_entry_intr_info_field;
>> +			vmcs12->idt_vectoring_error_code =
>> +				vmcs12->vm_entry_exception_error_code;
>> +			vmcs12->vm_exit_instruction_len =
>> +				vmcs12->vm_entry_instruction_len;
>> +			vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
>> +
>> +			/*
>> +			 * We do not drop NMIs that targeted L2 below as they
>> +			 * can also be reinjected over L1. But if this event
>> +			 * was an NMI, it was synthetic and came from L1.
>> +			 */
>> +			vcpu->arch.nmi_injected = false;
>> +		} else
>> +			/*
>> +			 * Transfer the event L0 may wanted to inject into L2
>> +			 * to IDT_VECTORING_INFO_FIELD.
>> +			 */
> I do not understand the comment. This transfers an event from event queue into vmcs12.
> Since vmx_complete_interrupts() transfers event that L1 tried to inject
> into event queue too he we handle not only L0->L2, but also L1->L2
> events too.

I'm not sure if I fully understand your remark. Is it that the comment
is only talking about L0 events? That is indeed not fully true, L1
events should make it to the architectural queue as well. Will adjust this.

> In fast I think only "else" part of this if() is needed. 

Yes, probably.

Thanks,
Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

next prev parent reply	other threads:[~2013-03-17 15:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-16 10:23 [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 1/5] KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1 Jan Kiszka
2013-03-17 13:47   ` Gleb Natapov
2013-03-16 10:23 ` [PATCH v2 2/5] KVM: nVMX: Rework event injection and recovery Jan Kiszka
2013-03-17 13:45   ` Gleb Natapov
2013-03-17 15:02     ` Jan Kiszka [this message]
2013-03-17 15:14       ` Gleb Natapov
2013-03-17 15:17         ` Jan Kiszka
2013-03-17 15:19           ` Gleb Natapov
2013-03-16 10:23 ` [PATCH v2 3/5] KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 4/5] KVM: nVMX: Fix conditions for interrupt injection Jan Kiszka
2013-03-16 10:23 ` [PATCH v2 5/5] KVM: nVMX: Fix conditions for NMI injection Jan Kiszka
2013-03-16 10:42 ` [PATCH v2 0/5] KVM: nVMX: Make direct IRQ/NMI injection work Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5145DAEF.6000400@web.de \
    --to=jan.kiszka@web.de \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nyh@math.technion.ac.il \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.