From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Liran Alon <liran.alon@oracle.com>
Cc: kvm@vger.kernel.org, "Paolo Bonzini" <pbonzini@redhat.com>,
"Radim Krčmář" <rkrcmar@redhat.com>,
"Jon Doron" <arilou@gmail.com>,
"Sean Christopherson" <sean.j.christopherson@intel.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: x86: nVMX: allow RSM to restore VMXE CR4 flag
Date: Wed, 27 Mar 2019 11:08:34 +0100 [thread overview]
Message-ID: <8736n8aau5.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <06E50BD4-B3AC-4DBB-B700-80C30F2DC8BB@oracle.com>
Liran Alon <liran.alon@oracle.com> writes:
>> On 26 Mar 2019, at 15:48, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>
>> Liran Alon <liran.alon@oracle.com> writes:
>>
>>>> On 26 Mar 2019, at 15:07, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>>> - Instread of putting the temporary HF_SMM_MASK drop to
>>>> rsm_enter_protected_mode() (as was suggested by Liran), move it to
>>>> emulator_set_cr() modifying its interface. emulate.c seems to be
>>>> vcpu-specifics-free at this moment, we may want to keep it this way.
>>>> - It seems that Hyper-V+UEFI on KVM is still broken, I'm observing sporadic
>>>> hangs even with this patch. These hangs, however, seem to be unrelated to
>>>> rsm.
>>>
>>> Feel free to share details on these hangs ;)
>>>
>>
>> You've asked for it)
>>
>> The immediate issue I'm observing is some sort of a lockup which is easy
>> to trigger with e.g. "-usb -device usb-tablet" on Qemu command line; it
>> seems we get too many interrupts and combined with preemtion timer for
>> L2 we're not making any progress:
>>
>> kvm_userspace_exit: reason KVM_EXIT_IOAPIC_EOI (26)
>> kvm_set_irq: gsi 18 level 1 source 0
>> kvm_msi_set_irq: dst 0 vec 177 (Fixed|physical|level)
>> kvm_apic_accept_irq: apicid 0 vec 177 (Fixed|edge)
>> kvm_fpu: load
>> kvm_entry: vcpu 0
>> kvm_exit: reason VMRESUME rip 0xfffff80000848115 info 0 0
>> kvm_entry: vcpu 0
>> kvm_exit: reason PREEMPTION_TIMER rip 0xfffff800f4448e01 info 0 0
>> kvm_nested_vmexit: rip fffff800f4448e01 reason PREEMPTION_TIMER info1 0 info2 0 int_info 0 int_info_err 0
>> kvm_nested_vmexit_inject: reason EXTERNAL_INTERRUPT info1 0 info2 0 int_info 800000b1 int_info_err 0
>> kvm_entry: vcpu 0
>> kvm_exit: reason APIC_ACCESS rip 0xfffff8000081fe11 info 10b0 0
>> kvm_apic: apic_write APIC_EOI = 0x0
>> kvm_eoi: apicid 0 vector 177
>> kvm_fpu: unload
>> kvm_userspace_exit: reason KVM_EXIT_IOAPIC_EOI (26)
>> ...
>> (and the pattern repeats)
>>
>> Maybe it is a usb-only/Qemu-only problem, maybe not.
>>
>> --
>> Vitaly
>
> The trace of kvm_apic_accept_irq should indicate that __apic_accept_irq() was called to inject an interrupt to L1 guest.
> (I know that now we are running in L1 because next exit is a VMRESUME).
>
> However, it is surprising to see that on next entry to guest, no interrupt was injected by vmx_inject_irq().
> It may be because L1 guest is currently running with interrupt disabled and therefore only an IRQ-window was requested.
> (Too bad we don’t have a trace for this…)
>
> Next, we got an exit from L1 guest on VMRESUME. As part of it’s handling, active VMCS was changed from vmcs01 to vmcs02.
> I believe the immediate exit later on preemption-timer was because the immediate-exit-request mechanism was invoked
> which is now implemented by setting a VMX preemption-timer with value of 0 (Thanks to Sean).
> (See vmx_vcpu_run() -> vmx_update_hv_timer() -> vmx_arm_hv_timer(vmx, 0)).
> (Note that the pending interrupt was evaluated because of a recent patch of mine to nested_vmx_enter_non_root_mode()
> to request KVM_REQ_EVENT when vmcs01 have requested an IRQ-window)
>
> Therefore when entering L2, you immediately get an exit on PREEMPTION_TIMER which will cause eventually L0 to call
> vmx_check_nested_events() which notices now the pending interrupt that should have been injected before to L1
> and now exit from L2 to L1 on EXTERNAL_INTERRUPT on vector 0xb1.
>
> Then L1 handles the interrupt by performing an EOI to LAPIC which propagate an EOI to IOAPIC which immediately re-inject
> the interrupt (after clearing the remote_irr) as the irq-line is still set. i.e. QEMU’s ioapic_eoi_broadcast() calls ioapic_service() immediate after it clears remote-irr for this pin.
>
> Also note that in trace we see only a single kvm_set_irq to level 1 but we don’t see immediately another kvm_set_irq to level 0.
> This should indicate that in QEMU’s IOAPIC redirection-table, this pin is configured as level-triggered interrupt.
> However, the trace of kvm_apic_accept_irq indicates that this interrupt is raised as an edge-triggered interrupt.
>
> To sum up:
> 1) I would create a patch to add a trace to vcpu_enter_guest() when calling enable_smi_window() / enable_nmi_window() / enable_irq_window().
> 2) It is worth investigating why MSI trigger-mode is edge-triggered instead of level-triggered.
> 3) If this is indeed a level-triggered interrupt, it is worth investigating how the interrupt source behaves. i.e. What cause this device to lower the irq-line?
> (As we don’t see any I/O Port or MMIO access by L1 guest interrupt-handler before performing the EOI)
> 4) Does this issue reproduce also when running with kernel-irqchip? (Instead of split-irqchip)
>
Thank you Liran,
all are valuable suggestions. It seems the isssue doesn't reproduce with
'kernel-irqchip=on' but reproduces with "kernel-irqchip=split". My first
guess would then be that we're less picky with in-kernel implementation
about the observed edge/level discrepancy. I'll be investigating and
share my findings.
--
Vitaly
next prev parent reply other threads:[~2019-03-27 10:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-26 13:07 [PATCH] KVM: x86: nVMX: allow RSM to restore VMXE CR4 flag Vitaly Kuznetsov
2019-03-26 13:11 ` Liran Alon
2019-03-26 13:48 ` Vitaly Kuznetsov
2019-03-26 15:02 ` Liran Alon
2019-03-27 10:08 ` Vitaly Kuznetsov [this message]
[not found] ` <20190327192946.19128-1-sean.j.christopherson@intel.com>
[not found] ` <87va03ml7n.fsf@vitty.brq.redhat.com>
[not found] ` <CALMp9eQPwFy5GvjDbE9wQQYEDdYfHzEwm6n1XZgQ_hCuk9vp+Q@mail.gmail.com>
2019-04-09 8:21 ` Vitaly Kuznetsov
2019-04-09 16:31 ` Paolo Bonzini
2019-04-10 9:38 ` [RFC] selftests: kvm: add a selftest for SMM Vitaly Kuznetsov
2019-04-10 10:10 ` Paolo Bonzini
2019-04-10 10:32 ` Vitaly Kuznetsov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8736n8aau5.fsf@vitty.brq.redhat.com \
--to=vkuznets@redhat.com \
--cc=arilou@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=liran.alon@oracle.com \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=sean.j.christopherson@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.