All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Longpeng (Mike)" <longpeng2@huawei.com>
To: kvm <kvm@vger.kernel.org>, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	alex.williamson@redhat.com
Cc: "Huangweidong (C)" <weidong.huang@huawei.com>,
	Gonglei <arei.gonglei@huawei.com>,
	"wangxin (U)" <wangxinxin.wang@huawei.com>
Subject: [help] host kernel panic in kvm's wakeup_handler()
Date: Wed, 24 May 2017 11:57:34 +0800	[thread overview]
Message-ID: <592504AE.6040306@huawei.com> (raw)

Hi guys,

We power-on/power-off 20 VMs(4 VMs with vfio passthrough NICs) concurrently so
many times, and then encounter a host-panic problem:

[152878.870508] general protection fault: 0000 [#1] SMP
[152878.878710] collected_len = 1048576, LOG_BUF_LEN_LOCAL = 1048576
[152878.886921] kbox current status: maintain, do not flush regions to devices.
[152878.893952] kbox: notify die begin
[152878.897453] kbox: no notify die func register. no need to notify
[152878.903533] do nothing after die!
[152878.906929] Modules linked in: ib_uverbs(OVE) vhost_scsi(OE)
target_core_pscsi target_core_file target_core_iblock target_core_mod
guest_kbox_ram(O) kbox_pci(OVE) igb(OVE) mlx4_ib(OVE) ib_sa(OVE) ib_mad(OVE)
ib_core(OVE) ib_addr(OVE) ib_netlink(OVE) mlx4_en(OVE) mlx4_core(OVE)
compat(OVE) vfio_pci vfio_iommu_type1 vfio(OVE) prio(O) nat(O) vport_vxlan(O)
openvswitch(O) nf_defrag_ipv6 gre libcrc32c ixgbe(O) ext3 mbcache jbd kbox(O)
pmcint(O) signo_catch(O) dm_mod vxlan ip6_udp_tunnel udp_tunnel sd_mod
crc_t10dif crct10dif_generic sg ipmi_devintf iTCO_wdt iTCO_vendor_support
kvm_intel(O) kvm(O) coretemp crct10dif_pclmul crct10dif_common crc32_pclmul
crc32c_intel ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul
ablk_helper cryptd mpt2sas ahci i2c_algo_bit ptp libahci raid_class pps_core
i2c_i801 libata scsi_transport_sas dca lpc_ich i2c_core mfd_core shpchp ipmi_si
ipmi_msghandler nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vhost_net(O)
tun(O) vhost(O) macvtap macvlan irqbypass ip_tables [last unloaded: igb]
[152878.998665] CPU: 10 PID: 0 Comm: swapper/10 Tainted: G        W  OE
----V-------   3.10.0-327.49.58.45_12.x86_64 #1
[152879.009245] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. CH80GPUB8/CH80GPUB8,
BIOS GPUBV201 06/18/2015
[152879.018881] task: ffff881fd2ce7300 ti: ffff881fd2d10000 task.ti:
ffff881fd2d10000
[152879.026803] RIP: 0010:[<ffffffffa1767ec1>]  [<ffffffffa1767ec1>]
wakeup_handler+0x71/0xb0 [kvm_intel]
[152879.036460] RSP: 0018:ffff883fff003f70  EFLAGS: 00010083
[152879.042024] RAX: dead000000100100 RBX: dead0000001000b0 RCX: ffff883fff0176f0
[152879.049595] RDX: ffff883fff000000 RSI: 0000000000000082 RDI: ffff881c9c7f0000
[152879.057139] RBP: ffff883fff003f90 R08: ffff881e522dfd90 R09: 0000000000000018
[152879.061675] mlx4_en: eth1: Port:2: removing fa:29:3e:2e:68:80
[152879.070720] R10: 000000000000039f R11: ffff881cfbf278f6 R12: 00000000000176e0
[152879.078282] R13: 000000000000000a R14: 00000000000176f0 R15: ffffffff81a13538
[152879.085845] FS:  0000000000000000(0000) GS:ffff883fff000000(0000)
knlGS:0000000000000000
[152879.094361] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[152879.100378] CR2: 0000000000605168 CR3: 000000000195e000 CR4: 00000000003427e0
[152879.107921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[152879.115478] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[152879.123019] Stack:
[152879.125313]  0000000000000000 0000000000000004 00008b2da04a3938 0000000000000004
[152879.133227]  ffff883fff003fa8 ffffffff81016a28 ffffe8ffff800500 ffff881fd2d13e78
[152879.141121]  ffffffff81655cdd ffff881fd2d13dc8 <EOI>  ffff881fd2d13e78
00000000000003e8
[152879.149702]  ffff881cfbf278f6 000000000000039f 0000000000000018 00000000000003e8
[152879.157647]  00008b2da04f9b8e 0000000000000018 0000000225c17d03 ffff881fd2d13fd8
[152879.165597]  00008b2da04f9b8e ffffffffffffff0e ffffffff814e2b72 0000000000000010
[152879.173560]  0000000000000206 ffff881fd2d13e50 0000000000000018 ffffe8ffff800500
[152879.181401]  0000000000000004 0000000000000004 ffffffff81a133c0 0000000000000000
[152879.189297]  ffff881fd2d13eb8 ffffffff814e2cb9 0000000a00000000 ffff881fd2d10000
[152879.197183]  ffffffff81a7de20 ffff881fd2d10000 ffff881fd2d10000 0000000000000000
[152879.205069]  ffff881fd2d13ec8 ffffffff8101e68e ffff881fd2d13f20 ffffffff810d7535
[152879.212968]  ffff881fd2d13fd8 ffff881fd2d10000 a960cc5a1933ed1c ef90c751bae26ef0
[152879.220892]  ffff881fd2d13f30 0000000000000000 0000000000000000 0000000000000000
[152879.228792]  0000000000000000 ffff881fd2d13f48 ffffffff81047c1a ef90c751bae26ef0
[152879.236675]  f26ae3384c8900f4 0000000000000000 0000000000000000 0000000000000000
[152879.244597]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[152879.252505]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[152879.260490]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[152879.268393]  0000000000000000 0000000000000000 0000000000000000 ffffffffffffffff
[152879.276269]  0000000000000000 0000000000000010 0000000000000202 ffff881fd2d13f58
[152879.284171]  0000000000000018
[152879.287489] Call Trace:
[152879.290205]  <IRQ>
[152879.292219]  [<ffffffff81016a28>] smp_kvm_posted_intr_wakeup_ipi+0x48/0x60
[152879.299762]  [<ffffffff81655cdd>] kvm_posted_intr_wakeup_ipi+0x6d/0x80
[152879.306567]  <EOI>
[152879.308598]  [<ffffffff814e2b72>] ? cpuidle_enter_state+0x52/0xc0
[152879.315359]  [<ffffffff814e2cb9>] cpuidle_idle_call+0xd9/0x210
[152879.321481]  [<ffffffff8101e68e>] arch_cpu_idle+0xe/0x30
[152879.327058]  [<ffffffff810d7535>] cpu_startup_entry+0x245/0x290
[152879.333224]  [<ffffffff81047c1a>] start_secondary+0x1ba/0x230
[152879.339222] Code: 4a 8d 0c 32 48 39 c8 48 8d 58 b0 75 1e eb 3b 0f 1f 00 4a
8b 14 ed a0 14 a7 81 48 8b 43 50 49 8d 0c 16 48 8d 58 b0 48 39 c8 74 1f <48> 8b
83 e0 30 00 00 a8 01 74 dc 48 89 df e8 1c 6d e5 fe eb d2
[152879.360254] RIP  [<ffffffffa1767ec1>] wakeup_handler+0x71/0xb0 [kvm_intel]
[152879.367436]  RSP <ffff883fff003f70>
[152879.371668] ---[ end trace 382c2b1701889417 ]---

There's no vmcore for some reason, but we disassembly the wakeup_handler():
    ......
    1e92:       4a 8b 04 32             mov    (%rdx,%r14,1),%rax <-- *Here*
    1e96:       4a 8d 0c 32             lea    (%rdx,%r14,1),%rcx
    1e9a:       48 39 c8                cmp    %rcx,%rax
    1e9d:       48 8d 58 b0             lea    -0x50(%rax),%rbx
    1ea1:       75 1e                   jne    1ec1 <wakeup_handler+0x71>
    1ea3:       eb 3b                   jmp    1ee0 <wakeup_handler+0x90>
    1ea5:       0f 1f 00                nopl   (%rax)
    1ea8:       4a 8b 14 ed 00 00 00    mov    0x0(,%r13,8),%rdx
    1eaf:       00
    1eb0:       48 8b 43 50             mov    0x50(%rbx),%rax
    1eb4:       49 8d 0c 16             lea    (%r14,%rdx,1),%rcx
    1eb8:       48 8d 58 b0             lea    -0x50(%rax),%rbx
    1ebc:       48 39 c8                cmp    %rcx,%rax
    1ebf:       74 1f                   je     1ee0 <wakeup_handler+0x90>
    1ec1:       48 8b 83 e0 30 00 00    mov    0x30e0(%rbx),%rax <-- *Here*
    ......
it crashed at *1ec1* and %rax get a wrong value(0xdead000000100100) at *1e92*,
it seems the *blocked_vcpu_on_cpu* list is corrupted, but kvm only access this
list in pre_block/post_block/wakeup_handler, and these three functions seems good.

kvm version is 4.4-stable.

Do you have any ideas? Any suggestion would be greatly appreciated, thanks!

-- 
Regards,
Longpeng(Mike)

             reply	other threads:[~2017-05-24  3:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-24  3:57 Longpeng (Mike) [this message]
2017-05-24  4:34 ` [help] host kernel panic in kvm's wakeup_handler() Alex Williamson
2017-05-24  5:04   ` Longpeng (Mike)
2017-05-26 10:40     ` Paolo Bonzini
2017-05-26 10:53       ` Longpeng (Mike)
2017-06-05 16:21       ` Longpeng (Mike)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=592504AE.6040306@huawei.com \
    --to=longpeng2@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=wangxinxin.wang@huawei.com \
    --cc=weidong.huang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.