From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: Linux Crash Caused By KVM? Date: Sun, 15 Apr 2012 13:05:38 +0300 Message-ID: <4F8A9D72.7070506@redhat.com> References: <4F8598FE.5020400@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Peijie Yu , kvm@vger.kernel.org To: Eric Northup Return-path: Received: from mx1.redhat.com ([209.132.183.28]:35877 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753833Ab2DOKFm (ORCPT ); Sun, 15 Apr 2012 06:05:42 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 04/11/2012 09:59 PM, Eric Northup wrote: > On Wed, Apr 11, 2012 at 7:45 AM, Avi Kivity wrote: > > On 04/11/2012 05:11 AM, Peijie Yu wrote: > >> For this problem, i found that panic is caused by > >> BUG_ON(in_nmi()) which means NMI happened during another NMI Context; > >> But i check the Intel Technical Manual and found "While an NMI > >> interrupt handler is executing, the processor disables additional > >> calls to the NMI handler until the next IRET instruction is executed." > >> So, how this happen? > >> > > > > The NMI path for kvm is different; the processor exits from the guest > > with NMIs blocked, then executes kvm code until it issues "int $2" in > > vmx_complete_interrupts(). If an IRET is executed in this path, then > > NMIs will be unblocked and nested NMIs may occur. > > > > One way this can happen is if we access the vmap area and incur a fault, > > between the VMEXIT and invoking the NMI handler. Or perhaps the NMI > > handler itself generates a fault. Or we have a debug exception in that path. > > > > Is this reproducible? > > As an FYI, there have been BIOSes whose SMI handlers ran IRETs. So > the NMI blocking can go away surprisingly. > > See 29.8 "NMI handling while in SMM" in the Intel SDM vol 3. Interesting, thanks. >>From 29.8 it looks like you don't even need to issue IRET within SMM, since SMM doesn't save/restore the NMI blocking flag. However, this being a server, and the crash being in kvm code, I don't think we can rule out that this is a kvm bug. -- error compiling committee.c: too many arguments to function