From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [PATCH 10/11] VMX: work around lacking VNMI support
Date: Tue, 23 Sep 2008 17:15:01 +0200
Message-ID: <48D907F5.2000401@siemens.com>
References: <48D74CE6.5060008@siemens.com> <48D8AF84.3020707@siemens.com> <20080923090021.GB3072@minantech.com> <200809231708.09617.sheng.yang@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Gleb Natapov <gleb@qumranet.com>, kvm-devel <kvm@vger.kernel.org>,
	Avi Kivity <avi@redhat.com>
To: "Yang, Sheng" <sheng.yang@intel.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from gecko.sbs.de ([194.138.37.40]:16721 "EHLO gecko.sbs.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752040AbYIWPPV (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 23 Sep 2008 11:15:21 -0400
In-Reply-To: <200809231708.09617.sheng.yang@intel.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Yang, Sheng wrote:
> On Tuesday 23 September 2008 17:00:21 Gleb Natapov wrote:
>> On Tue, Sep 23, 2008 at 10:57:40AM +0200, Jan Kiszka wrote:
>>> Gleb Natapov wrote:
>>>> On Tue, Sep 23, 2008 at 10:46:38AM +0200, Jan Kiszka wrote:
>>>>> Gleb Natapov wrote:
>>>>>> On Mon, Sep 22, 2008 at 09:59:07AM +0200, Jan Kiszka wrote:
>>>>>>> @@ -2356,6 +2384,19 @@ static void vmx_inject_nmi(struct kvm_vc
>>>>>>>  {
>>>>>>>          struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>>>>>
>>>>>>> +        if (!cpu_has_virtual_nmis()) {
>>>>>>> +                /*
>>>>>>> +                 * Tracking the NMI-blocked state in software is
>>>>>>> built upon +                 * finding the next open IRQ window.
>>>>>>> This, in turn, depends on +                 * well-behaving guests:
>>>>>>> They have to keep IRQs disabled at +                 * least as long
>>>>>>> as the NMI handler runs. Otherwise we may +                 * cause
>>>>>>> NMI nesting, maybe breaking the guest. But as this is +             
>>>>>>>    * highly unlikely, we can live with the residual risk. +         
>>>>>>>        */
>>>>>>> +                vmx->soft_vnmi_blocked = 1;
>>>>>>> +                vmx->vnmi_blocked_time = 0;
>>>>>>> +        }
>>>>>>> +
>>>>>> We still get here with vmx->soft_vnmi_blocked = 1. Trying to find out
>>>>>> how.
>>>>> We should only come along here with vnmi blocked on reinjection (after
>>>>> a fault on calling the handler).
>>>> I see that nmi_injected is never cleared and it is check before calling
>>>> vmx_inject_nmi();
>>> That should happen in vmx_complete_interrupts, but only if the exit
>>> takes place after the NMI has been successfully delivered to the guest
>>> (which is not the case if invoking the handler raises an exception). So
>>> far for the theory...
>> Okey, I have this one in dmesg:
>> kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9
>>
> Oh... Another task switch issue...

Maybe that pending vector is #2, the NMI that is supposed to trigger the
task switch?

> 
> I think it's may not be a issue import by this patchset? Seems need more 
> debug... 
> 
> The patchset is OK for me, except I don't know when we would need that timeout 
> one (buggy guest?...), and we may also root cause this issue or ensure that 
> it's not a regression.

The timeout is indeed for buggy guests:

disable_irqs();
spin_endlessly();

Linux, e.g., needs more than one watchdog NMI over this code to detect
that there is a lock-up. With soft-VNMIs + their timeouts, this
detection will take longer then in reality, but it will still work. And
one second is large enough to practically avoid breaking into a running
NMI handler (unless the guest is totally screwed and spins inside that
handler).

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux