Re: Linux Crash Caused By KVM?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Avi Kivity <avi@redhat.com>
To: Peijie Yu <yupeijie1012@gmail.com>
Cc: kvm@vger.kernel.org
Subject: Re: Linux Crash Caused By KVM?
Date: Wed, 11 Apr 2012 17:45:18 +0300	[thread overview]
Message-ID: <4F8598FE.5020400@redhat.com> (raw)
In-Reply-To: <CAKomMagyNgUwD9F0obiEjGG5WeFtZgi-0aBSDV2K=H0BQm2SvA@mail.gmail.com>

On 04/11/2012 05:11 AM, Peijie Yu wrote:
> Hi,all
>   I have met some problems while utilizing KVM。
>   The test environment is:
> Summary:        Dell R610, 1 x Xeon E5645 2.40GHz, 47.1GB / 48GB 1333MHz DDR3
> System:         Dell PowerEdge R610 (Dell 08GXHX)
> Processors:     1 (of 2) x Xeon E5645 2.40GHz 5860MHz FSB (HT enabled,
> 6 cores, 24 threads)
> Memory:         47.1GB / 48GB 1333MHz DDR3 == 12 x 4GB
> Disk:           sda: 299GB (72%) JBOD
> Disk:           sdb (host9): 5.0TB JBOD == 1 x VIRTUAL-DISK
> Disk:           sdc (host11): 5.0TB JBOD == 1 x VIRTUAL-DISK
> Disk:           sdd (host12): 5.0TB JBOD == 1 x VIRTUAL-DISK
> Disk:           sde (host10): 5.0TB JBOD == 1 x VIRTUAL-DISK
> Disk-Control:   mpt2sas0: LSI Logic / Symbios Logic SAS2008
> PCI-Express Fusion-MPT SAS-2 [Falcon]
> Disk-Control:   host9:
> Disk-Control:   host10:
> Disk-Control:   host11:
> Disk-Control:   host12:
> Chipset:        Intel 82801IB (ICH9)
> Network:        br1 (bridge): 14:fe:b5:dc:2c:6e
> Network:        em1 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit,
> 14:fe:b5:dc:2c:6e, 1000Mb/s <full-duplex>
> Network:        em2 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit,
> 14:fe:b5:dc:2c:70, 1000Mb/s <full-duplex>
> Network:        em3 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit,
> 14:fe:b5:dc:2c:72, 1000Mb/s <full-duplex>
> Network:        em4 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit,
> 14:fe:b5:dc:2c:74, 1000Mb/s <full-duplex>
> Network:        vnet0 (tun): fe:16:3e:49:fb:05, 10Mb/s <full-duplex>
> Network:        vnet1 (tun): fe:16:3e:cb:c0:d1, 10Mb/s <full-duplex>
> Network:        vnet2 (tun): fe:16:3e:1e:c1:c4, 10Mb/s <full-duplex>
> Network:        vnet3 (tun): fe:16:3e:d5:58:f4, 10Mb/s <full-duplex>
> Network:        vnet4 (tun): fe:16:3e:15:b4:16, 10Mb/s <full-duplex>
> Network:        vnet5 (tun): fe:16:3e:d2:07:47, 10Mb/s <full-duplex>
> Network:        vnet6 (tun): fe:16:3e:e1:2b:b9, 10Mb/s <full-duplex>
> OS:             RHEL Server 6.1 (Santiago), Linux
> 2.6.32-220.2.1.el6.x86_64 x86_64, 64-bit
> BIOS:           Dell 3.0.0 01/31/2011
>
>   And during the term i utilize KVM, some issues happen:
>   1.   Host Crash Caused by
>       a.   Kernel Panic
>   31       KERNEL: /usr/lib/debug/lib/modules/2.6.32-131.12.1.el6.x86_64/vmlinux
>   32     DUMPFILE: ../vmcore_2012.13.46  [PARTIAL DUMP]
>   33         CPUS: 24
>   34         DATE: Wed Jan 11 13:34:13 2012
>   35       UPTIME: 25 days, 04:11:05
>   36 LOAD AVERAGE: 223.16, 172.97, 158.23
>   37        TASKS: 1464
>   38     NODENAME: dell2.localdomain
>   39      RELEASE: 2.6.32-131.12.1.el6.x86_64
>   40      VERSION: #1 SMP Sun Jul 31 16:44:56 EDT 2011
>   41      MACHINE: x86_64  (2394 Mhz)
>   42       MEMORY: 48 GB
>   43        PANIC: "kernel BUG at arch/x86/kernel/traps.c:547!"
>   44          PID: 11851
>   45      COMMAND: "qemu-kvm"
>   46         TASK: ffff880c071c3500  [THREAD_INFO: ffff880c132d8000]
>   47          CPU: 1
>   48        STATE: TASK_RUNNING (PANIC)
>   49
>   50 PID: 11851  TASK: ffff880c071c3500  CPU: 1   COMMAND: "qemu-kvm"
>   51  #0 [ffff880028207be0] machine_kexec at ffffffff810310cb
>   52  #1 [ffff880028207c40] crash_kexec at ffffffff810b6392
>   53  #2 [ffff880028207d10] oops_end at ffffffff814de670
>   54  #3 [ffff880028207d40] die at ffffffff8100f2eb
>   55  #4 [ffff880028207d70] do_trap at ffffffff814ddf64
>   56  #5 [ffff880028207dd0] do_invalid_op at ffffffff8100ceb5
>   57  #6 [ffff880028207e70] invalid_op at ffffffff8100bf5b
>   58     [exception RIP: do_nmi+554]
>   59     RIP: ffffffff814de43a  RSP: ffff880028207f28  RFLAGS: 00010002
>   60     RAX: ffff880c132d9fd8  RBX: ffff880028207f58  RCX: 00000000c0000101
>   61     RDX: 00000000ffff8800  RSI: ffffffffffffffff  RDI: ffff880028207f58
>   62     RBP: ffff880028207f48   R8: ffff88005ebf9800   R9: ffff880028203fc0
>   63     R10: 0000000000000034  R11: 00000000000003e8  R12: 000000000000cc20
>   64     R13: ffffffff816024a0  R14: ffff88005ebf9800  R15: 00007ffffffff000
>   65     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   66  #7 [ffff880028207f50] nmi at ffffffff814ddc90
>   67     [exception RIP: bad_to_user+37]
>   68     RIP: ffffffff814e4e2b  RSP: ffff880028207bb0  RFLAGS: 00010046
>   69     RAX: ffff880c132d9fd8  RBX: ffff880c132d9c48  RCX: 0000000000000001
>   70     RDX: 0000000000000000  RSI: 000000010000000b  RDI: ffff880028207c08
>   71     RBP: ffff880028207c48   R8: ffff88005ebf9800   R9: ffff880028203fc0
>   72     R10: 0000000000000034  R11: 00000000000003e8  R12: 000000000000cc20
>   73     R13: ffffffff816024a0  R14: ffff88005ebf9800  R15: 00007ffffffff000
>   74     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   75 --- <NMI exception stack> ---
>
>      For this problem, i found that panic is caused by
> BUG_ON(in_nmi()) which means NMI happened during another NMI Context;
> But i check the Intel Technical Manual and found "While an NMI
> interrupt handler is executing, the processor disables additional
> calls to the NMI handler until the next IRET instruction is executed."
> So, how this happen?
>

The NMI path for kvm is different; the processor exits from the guest
with NMIs blocked, then executes kvm code until it issues "int $2" in
vmx_complete_interrupts(). If an IRET is executed in this path, then
NMIs will be unblocked and nested NMIs may occur.

One way this can happen is if we access the vmap area and incur a fault,
between the VMEXIT and invoking the NMI handler. Or perhaps the NMI
handler itself generates a fault. Or we have a debug exception in that path.

Is this reproducible?

-- 
error compiling committee.c: too many arguments to function

next prev parent reply	other threads:[~2012-04-11 14:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-11  2:11 Linux Crash Caused By KVM? Peijie Yu
2012-04-11 14:45 ` Avi Kivity [this message]
2012-04-11 18:59   ` Eric Northup
2012-04-15 10:05     ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F8598FE.5020400@redhat.com \
    --to=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=yupeijie1012@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.