All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Longpeng (Mike)" <longpeng2@huawei.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm <kvm@vger.kernel.org>, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Huangweidong (C)" <weidong.huang@huawei.com>,
	Gonglei <arei.gonglei@huawei.com>,
	"wangxin (U)" <wangxinxin.wang@huawei.com>
Subject: Re: [help] host kernel panic in kvm's wakeup_handler()
Date: Wed, 24 May 2017 13:04:11 +0800	[thread overview]
Message-ID: <5925144B.2030207@huawei.com> (raw)
In-Reply-To: <20170523223419.59f7d465@t450s.home>



On 2017/5/24 12:34, Alex Williamson wrote:

> On Wed, 24 May 2017 11:57:34 +0800
> "Longpeng (Mike)" <longpeng2@huawei.com> wrote:
> 
>> Hi guys,
>>
>> We power-on/power-off 20 VMs(4 VMs with vfio passthrough NICs) concurrently so
>> many times, and then encounter a host-panic problem:
>>
>> [152878.870508] general protection fault: 0000 [#1] SMP
>> [152878.878710] collected_len = 1048576, LOG_BUF_LEN_LOCAL = 1048576
>> [152878.886921] kbox current status: maintain, do not flush regions to devices.
>> [152878.893952] kbox: notify die begin
>> [152878.897453] kbox: no notify die func register. no need to notify
>> [152878.903533] do nothing after die!
>> [152878.906929] Modules linked in: ib_uverbs(OVE) vhost_scsi(OE)
>> target_core_pscsi target_core_file target_core_iblock target_core_mod
>> guest_kbox_ram(O) kbox_pci(OVE) igb(OVE) mlx4_ib(OVE) ib_sa(OVE) ib_mad(OVE)
>> ib_core(OVE) ib_addr(OVE) ib_netlink(OVE) mlx4_en(OVE) mlx4_core(OVE)
>> compat(OVE) vfio_pci vfio_iommu_type1 vfio(OVE) prio(O) nat(O) vport_vxlan(O)
>> openvswitch(O) nf_defrag_ipv6 gre libcrc32c ixgbe(O) ext3 mbcache jbd kbox(O)
>> pmcint(O) signo_catch(O) dm_mod vxlan ip6_udp_tunnel udp_tunnel sd_mod
>> crc_t10dif crct10dif_generic sg ipmi_devintf iTCO_wdt iTCO_vendor_support
>> kvm_intel(O) kvm(O) coretemp crct10dif_pclmul crct10dif_common crc32_pclmul
>> crc32c_intel ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul
>> ablk_helper cryptd mpt2sas ahci i2c_algo_bit ptp libahci raid_class pps_core
>> i2c_i801 libata scsi_transport_sas dca lpc_ich i2c_core mfd_core shpchp ipmi_si
>> ipmi_msghandler nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vhost_net(O)
>> tun(O) vhost(O) macvtap macvlan irqbypass ip_tables [last unloaded: igb]
>> [152878.998665] CPU: 10 PID: 0 Comm: swapper/10 Tainted: G        W  OE
>> ----V-------   3.10.0-327.49.58.45_12.x86_64 #1
>> [152879.009245] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. CH80GPUB8/CH80GPUB8,
>> BIOS GPUBV201 06/18/2015
>> [152879.018881] task: ffff881fd2ce7300 ti: ffff881fd2d10000 task.ti:
>> ffff881fd2d10000
>> [152879.026803] RIP: 0010:[<ffffffffa1767ec1>]  [<ffffffffa1767ec1>]
>> wakeup_handler+0x71/0xb0 [kvm_intel]
>> [152879.036460] RSP: 0018:ffff883fff003f70  EFLAGS: 00010083
>> [152879.042024] RAX: dead000000100100 RBX: dead0000001000b0 RCX: ffff883fff0176f0
>> [152879.049595] RDX: ffff883fff000000 RSI: 0000000000000082 RDI: ffff881c9c7f0000
>> [152879.057139] RBP: ffff883fff003f90 R08: ffff881e522dfd90 R09: 0000000000000018
>> [152879.061675] mlx4_en: eth1: Port:2: removing fa:29:3e:2e:68:80
>> [152879.070720] R10: 000000000000039f R11: ffff881cfbf278f6 R12: 00000000000176e0
>> [152879.078282] R13: 000000000000000a R14: 00000000000176f0 R15: ffffffff81a13538
>> [152879.085845] FS:  0000000000000000(0000) GS:ffff883fff000000(0000)
>> knlGS:0000000000000000
>> [152879.094361] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [152879.100378] CR2: 0000000000605168 CR3: 000000000195e000 CR4: 00000000003427e0
>> [152879.107921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [152879.115478] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [152879.123019] Stack:
>> [152879.125313]  0000000000000000 0000000000000004 00008b2da04a3938 0000000000000004
>> [152879.133227]  ffff883fff003fa8 ffffffff81016a28 ffffe8ffff800500 ffff881fd2d13e78
>> [152879.141121]  ffffffff81655cdd ffff881fd2d13dc8 <EOI>  ffff881fd2d13e78
>> 00000000000003e8
>> [152879.149702]  ffff881cfbf278f6 000000000000039f 0000000000000018 00000000000003e8
>> [152879.157647]  00008b2da04f9b8e 0000000000000018 0000000225c17d03 ffff881fd2d13fd8
>> [152879.165597]  00008b2da04f9b8e ffffffffffffff0e ffffffff814e2b72 0000000000000010
>> [152879.173560]  0000000000000206 ffff881fd2d13e50 0000000000000018 ffffe8ffff800500
>> [152879.181401]  0000000000000004 0000000000000004 ffffffff81a133c0 0000000000000000
>> [152879.189297]  ffff881fd2d13eb8 ffffffff814e2cb9 0000000a00000000 ffff881fd2d10000
>> [152879.197183]  ffffffff81a7de20 ffff881fd2d10000 ffff881fd2d10000 0000000000000000
>> [152879.205069]  ffff881fd2d13ec8 ffffffff8101e68e ffff881fd2d13f20 ffffffff810d7535
>> [152879.212968]  ffff881fd2d13fd8 ffff881fd2d10000 a960cc5a1933ed1c ef90c751bae26ef0
>> [152879.220892]  ffff881fd2d13f30 0000000000000000 0000000000000000 0000000000000000
>> [152879.228792]  0000000000000000 ffff881fd2d13f48 ffffffff81047c1a ef90c751bae26ef0
>> [152879.236675]  f26ae3384c8900f4 0000000000000000 0000000000000000 0000000000000000
>> [152879.244597]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [152879.252505]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [152879.260490]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [152879.268393]  0000000000000000 0000000000000000 0000000000000000 ffffffffffffffff
>> [152879.276269]  0000000000000000 0000000000000010 0000000000000202 ffff881fd2d13f58
>> [152879.284171]  0000000000000018
>> [152879.287489] Call Trace:
>> [152879.290205]  <IRQ>
>> [152879.292219]  [<ffffffff81016a28>] smp_kvm_posted_intr_wakeup_ipi+0x48/0x60
>> [152879.299762]  [<ffffffff81655cdd>] kvm_posted_intr_wakeup_ipi+0x6d/0x80
>> [152879.306567]  <EOI>
>> [152879.308598]  [<ffffffff814e2b72>] ? cpuidle_enter_state+0x52/0xc0
>> [152879.315359]  [<ffffffff814e2cb9>] cpuidle_idle_call+0xd9/0x210
>> [152879.321481]  [<ffffffff8101e68e>] arch_cpu_idle+0xe/0x30
>> [152879.327058]  [<ffffffff810d7535>] cpu_startup_entry+0x245/0x290
>> [152879.333224]  [<ffffffff81047c1a>] start_secondary+0x1ba/0x230
>> [152879.339222] Code: 4a 8d 0c 32 48 39 c8 48 8d 58 b0 75 1e eb 3b 0f 1f 00 4a
>> 8b 14 ed a0 14 a7 81 48 8b 43 50 49 8d 0c 16 48 8d 58 b0 48 39 c8 74 1f <48> 8b
>> 83 e0 30 00 00 a8 01 74 dc 48 89 df e8 1c 6d e5 fe eb d2
>> [152879.360254] RIP  [<ffffffffa1767ec1>] wakeup_handler+0x71/0xb0 [kvm_intel]
>> [152879.367436]  RSP <ffff883fff003f70>
>> [152879.371668] ---[ end trace 382c2b1701889417 ]---
>>
>> There's no vmcore for some reason, but we disassembly the wakeup_handler():
>>     ......
>>     1e92:       4a 8b 04 32             mov    (%rdx,%r14,1),%rax <-- *Here*
>>     1e96:       4a 8d 0c 32             lea    (%rdx,%r14,1),%rcx
>>     1e9a:       48 39 c8                cmp    %rcx,%rax
>>     1e9d:       48 8d 58 b0             lea    -0x50(%rax),%rbx
>>     1ea1:       75 1e                   jne    1ec1 <wakeup_handler+0x71>
>>     1ea3:       eb 3b                   jmp    1ee0 <wakeup_handler+0x90>
>>     1ea5:       0f 1f 00                nopl   (%rax)
>>     1ea8:       4a 8b 14 ed 00 00 00    mov    0x0(,%r13,8),%rdx
>>     1eaf:       00
>>     1eb0:       48 8b 43 50             mov    0x50(%rbx),%rax
>>     1eb4:       49 8d 0c 16             lea    (%r14,%rdx,1),%rcx
>>     1eb8:       48 8d 58 b0             lea    -0x50(%rax),%rbx
>>     1ebc:       48 39 c8                cmp    %rcx,%rax
>>     1ebf:       74 1f                   je     1ee0 <wakeup_handler+0x90>
>>     1ec1:       48 8b 83 e0 30 00 00    mov    0x30e0(%rbx),%rax <-- *Here*
>>     ......
>> it crashed at *1ec1* and %rax get a wrong value(0xdead000000100100) at *1e92*,
>> it seems the *blocked_vcpu_on_cpu* list is corrupted, but kvm only access this
>> list in pre_block/post_block/wakeup_handler, and these three functions seems good.
>>
>> kvm version is 4.4-stable.
>>
>> Do you have any ideas? Any suggestion would be greatly appreciated, thanks!
>>
> 
> Is this only seen with posted interrupt support enabled?  Booting with
> intremap=nopost on the kernel commandline would disable it.  Thanks,
> 
> Alex
> 


Hi Alex,

We tested with PI support enabled, but we not sure if it only occurs with PI
enabled yet.

*lscpu:*
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2618L v4 @ 2.20GHz
Stepping:              1
CPU MHz:               1452.085
BogoMIPS:              4405.88
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-9,20-29
NUMA node1 CPU(s):     10-19,30-39

We would try to reproduce the problem again. Thanks :)

> .
> 


-- 
Regards,
Longpeng(Mike)

  reply	other threads:[~2017-05-24  5:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-24  3:57 [help] host kernel panic in kvm's wakeup_handler() Longpeng (Mike)
2017-05-24  4:34 ` Alex Williamson
2017-05-24  5:04   ` Longpeng (Mike) [this message]
2017-05-26 10:40     ` Paolo Bonzini
2017-05-26 10:53       ` Longpeng (Mike)
2017-06-05 16:21       ` Longpeng (Mike)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5925144B.2030207@huawei.com \
    --to=longpeng2@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=wangxinxin.wang@huawei.com \
    --cc=weidong.huang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.