From: Armin Zentai <armin.zentai@ezit.hu>
To: Wei Liu <wei.liu2@citrix.com>
Cc: zoltan.kiss@citrix.com, xen-devel@lists.xen.org
Subject: Re: Trying to unmap invalid handle! pending_idx: @ drivers/net/xen-netback/netback.c:998 causes kernel panic/reboot
Date: Mon, 14 Jul 2014 12:53:49 +0200 [thread overview]
Message-ID: <53C3B6BD.1050904@ezit.hu> (raw)
In-Reply-To: <20140714095220.GB8551@zion.uk.xensource.com>
Hello!
On 14/07/14 11:52, Wei Liu wrote:
> Hello
>
> On Mon, Jul 14, 2014 at 04:25:54AM +0200, Armin Zentai wrote:
>> Dear Xen Developers!
>>
>>
>> We're running Xen on multiple machines, most of them are Dell R410 or SM
>> X8DTL, with one E5645 cpu, and 48 GB of RAM. We've update the kernel to
>> 3.15.4, after the some of our hypervisors started to rebooting at random
>> times.
>>
>> The logs were empty, and we have no information about the crashes, we've
>> tried some tricks, and at the end the netconsole kernel modul helped, so we
>> can do a very thin layer of remote kernel logging. We've found the following
>> in the remote logs:
>
> It's good you've got netconsole working. I would still like to point out
> that we have a wiki page on setting up serial console on Xen, which
> might be helpful.
>
> http://wiki.xen.org/wiki/Xen_Serial_Console
>
We've set up xen serial console, but we wanted to avoid about to reboot
the hypervisors if it's not neccessary, so it will be activated on the
systems on the next reboot.
(We have set up a system, that logs into every Dell iDRAC via telnet [we
have 18 nodes, so we cannot plug every machine via a physical serial
link], set up the SOL, and logs every output, but netconsole was a much
more painless solution to gather the logs)
>>
>> Jul 13 00:46:58 node11 [157060.106323] vif vif-2-0 h14z4mzbvfrrhb: Trying to
>> unmap invalid handle! pending_idx: c
>> Jul 13 00:46:58 node11 [157060.106476] ------------[ cut here ]------------
>> Jul 13 00:46:58 node11 [157060.106546] kernel BUG at
>> drivers/net/xen-netback/netback.c:998!
>> Jul 13 00:46:58 node11 [157060.106616] invalid opcode: 0000 [#1]
>> Jul 13 00:46:58 SMP
>> Jul 13 00:46:58 node11
> [...]
>> Jul 13 00:46:58 node11 [157060.112705] CPU: 0 PID: 0 Comm: swapper/0
>> Tainted: G E 3.15.4 #1
>> Jul 13 00:46:58 node11 [157060.112776] Hardware name: Supermicro
>> X8DTL/X8DTL, BIOS 1.1b 03/19/2010
>> Jul 13 00:46:58 node11 [157060.112848] task: ffffffff81c1b480 ti:
>> ffffffff81c00000 task.ti: ffffffff81c00000
>> Jul 13 00:46:58 node11 [157060.112936] RIP: e030:[<ffffffffa025f61d>]
>> Jul 13 00:46:58 node11 [<ffffffffa025f61d>] xenvif_idx_unmap+0x11d/0x130
>> [xen_netback]
>> Jul 13 00:46:58 node11 [157060.113078] RSP: e02b:ffff88008ea03d48 EFLAGS:
>> 00010292
>> Jul 13 00:46:58 node11 [157060.113147] RAX: 000000000000004a RBX:
>> 000000000000000c RCX: 0000000000000000
>> Jul 13 00:46:58 node11 [157060.113234] RDX: ffff88008a40b600 RSI:
>> ffff88008ea03a18 RDI: 000000000000021b
>> Jul 13 00:46:58 node11 [157060.113321] RBP: ffff88008ea03d88 R08:
>> 0000000000000000 R09: ffff88008a40b600
>> Jul 13 00:46:58 node11 [157060.113408] R10: ffff88008a0004e8 R11:
>> 00000000000006d8 R12: ffff8800569708c0
>> Jul 13 00:46:58 node11 [157060.113495] R13: ffff88006558fec0 R14:
>> ffff8800569708c0 R15: 0000000000000001
>> Jul 13 00:46:58 node11 [157060.113589] FS: 00007f351684b700(0000)
>> GS:ffff88008ea00000(0000) knlGS:0000000000000000
>> Jul 13 00:46:58 node11 [157060.113679] CS: e033 DS: 0000 ES: 0000 CR0:
>> 000000008005003b
>> Jul 13 00:46:58 node11 [157060.113747] CR2: 00007fc2a4372000 CR3:
>> 00000000049f3000 CR4: 0000000000002660
>> Jul 13 00:46:58 node11 [157060.113835] Stack:
>> Jul 13 00:46:58 node11 [157060.113896] ffff880056979f90
>> Jul 13 00:46:58 node11 ff00000000000001
>> Jul 13 00:46:58 node11 ffff880b0605e000
>> Jul 13 00:46:58 node11 0000000000000000
>> Jul 13 00:46:58 node11
>> Jul 13 00:46:58 node11 [157060.114143] ffff0000ffffffff
>> Jul 13 00:46:58 node11 00000000fffffff6
>> Jul 13 00:46:58 node11 0000000000000001
>> Jul 13 00:46:58 node11 ffff8800569769d0
>> Jul 13 00:46:58 node11
>> Jul 13 00:46:58 node11 [157060.114390] ffff88008ea03e58
>> Jul 13 00:46:58 node11 ffffffffa02622fc
>> Jul 13 00:46:58 node11 ffff88008ea03dd8
>> Jul 13 00:46:58 node11 ffffffff810b5223
>> Jul 13 00:46:58 node11
>> Jul 13 00:46:58 node11 [157060.114637] Call Trace:
>> Jul 13 00:46:58 node11 [157060.114700] <IRQ>
>> Jul 13 00:46:58 node11
>> Jul 13 00:46:58 node11 [157060.114750]
>> Jul 13 00:46:58 node11 [<ffffffffa02622fc>] xenvif_tx_action+0x27c/0x7f0
>> [xen_netback]
>> Jul 13 00:46:58 node11 [157060.114927] [<ffffffff810b5223>] ?
>> __wake_up+0x53/0x70
>> Jul 13 00:46:58 node11 [157060.114998] [<ffffffff810ca077>] ?
>> handle_irq_event_percpu+0xa7/0x1b0
>> Jul 13 00:46:58 node11 [157060.115073] [<ffffffffa02647d1>]
>> xenvif_poll+0x31/0x64 [xen_netback]
>> Jul 13 00:46:58 node11 [157060.115147] [<ffffffff81653d4b>]
>> net_rx_action+0x10b/0x290
>> Jul 13 00:46:58 node11 [157060.115221] [<ffffffff81071c73>]
>> __do_softirq+0x103/0x320
>> Jul 13 00:46:58 node11 [157060.115292] [<ffffffff81072015>]
>> irq_exit+0x135/0x140
>> Jul 13 00:46:58 node11 [157060.115363] [<ffffffff8144759c>]
>> xen_evtchn_do_upcall+0x3c/0x50
>> Jul 13 00:46:58 node11 [157060.115436] [<ffffffff8175c07e>]
>> xen_do_hypervisor_callback+0x1e/0x30
>> Jul 13 00:46:58 node11 [157060.115506] <EOI>
>> Jul 13 00:46:58 node11
>> Jul 13 00:46:58 node11 [157060.115551]
>> Jul 13 00:46:58 node11 [<ffffffff810013aa>] ?
>> xen_hypercall_sched_op+0xa/0x20
>> Jul 13 00:46:58 node11 [157060.115722] [<ffffffff810013aa>] ?
>> xen_hypercall_sched_op+0xa/0x20
>> Jul 13 00:46:58 node11 [157060.115794] [<ffffffff8100a200>] ?
>> xen_safe_halt+0x10/0x20
>> Jul 13 00:46:58 node11 [157060.115869] [<ffffffff8101dbbf>] ?
>> default_idle+0x1f/0xc0
>> Jul 13 00:46:58 node11 [157060.115939] [<ffffffff8101d38f>] ?
>> arch_cpu_idle+0xf/0x20
>> Jul 13 00:46:58 node11 [157060.116009] [<ffffffff810b5aa1>] ?
>> cpu_startup_entry+0x201/0x360
>> Jul 13 00:46:58 node11 [157060.116084] [<ffffffff817420a7>] ?
>> rest_init+0x77/0x80
>> Jul 13 00:46:58 node11 [157060.116156] [<ffffffff81d3a156>] ?
>> start_kernel+0x406/0x413
>> Jul 13 00:46:58 node11 [157060.116227] [<ffffffff81d39b6e>] ?
>> repair_env_string+0x5b/0x5b
>> Jul 13 00:46:58 node11 [157060.116298] [<ffffffff81d39603>] ?
>> x86_64_start_reservations+0x2a/0x2c
>> Jul 13 00:46:58 node11 [157060.116373] [<ffffffff81d3d5dc>] ?
>> xen_start_kernel+0x584/0x586
> [...]
>> Jul 13 00:46:58 node11
>> Jul 13 00:46:58 node11 [157060.119179] RIP
>> Jul 13 00:46:58 node11 [<ffffffffa025f61d>] xenvif_idx_unmap+0x11d/0x130
>> [xen_netback]
>> Jul 13 00:46:58 node11 [157060.119312] RSP <ffff88008ea03d48>
>> Jul 13 00:46:58 node11 [157060.119395] ---[ end trace 7e021c96c8cfea53 ]---
>> Jul 13 00:46:58 node11 [157060.119465] Kernel panic - not syncing: Fatal
>> exception in interrupt
>>
>>
>> h14z4mzbvfrrhb was a name of a VIF. This VIF belongs to a Windows Server
>> 2008 R2 X64 virtual machine. We had 6 random reboots until now, all of the
>> VIFs are belonged to the same operating system, but different virtual
>> machines. So only Windows Server 2008 R2 X64 system's virtual interfaces
>> caused the crashes, these systems has been provisioned from different
>> installs or templates. The GPLPV driver's versions are also different.
>>
>
> Unfortunately I don't have Windows server 2008 R2. :-(
>
> This bug is in guest TX path. What's the workload of your guest? Is
> there any pattern of its traffic?
It's not relevant, some of them uses one core at nearly 100%, some of
them had 1-2% CPU and 5-10 mbps of networking and/or IO. I've tried to
test the CPU with CPU burn, prime95, tried to stress the network with
SYN flood, IIS stress testing with apache ab, stressing the throughput
and bandwidth, but these attempts did not caused a reboot.
>
> I've checked changesets between 3.15.4 and 3.16-rc5 there's no fix for
> this, so this is the first report of this issue. If there's a reliable
> reproduce then that would be great.
>
> Zoltan, have you seen this before? Can your work on pktgen help?
>
>> [root@c2-node11 ~]# uname -a
>> Linux c2-node11 3.15.4 #1 SMP Tue Jul 8 17:58:26 CEST 2014 x86_64 x86_64
>> x86_64 GNU/Linux
>>
>>
>> The xm create config file of the specified VM (the other VM's config files
>> are the same):
>>
>> kernel = "/usr/lib/xen/boot/hvmloader"
>> device_model = "/usr/lib64/xen/bin/qemu-dm"
>> builder = "hvm"
>> memory = "2000"
>> name = "vna3mhwnv9pn4m"
>> vcpus = "1"
>>
>> timer_mode = "2"
>> viridian = "1"
>>
>> vif = [ "type=ioemu, mac=00:16:3e:64:c8:ba, bridge=x0evss6g1ztoa4, ip=...,
>> vifname=h14z4mzbvfrrhb, rate=100Mb/s" ]
>>
>> disk = [ "phy:/dev/q7jiqc2gh02b2b/xz7wget4ycmp77,ioemu:hda,w" ]
>> vnc = 1
>> vncpasswd="aaaaa1"
>> usbdevice="tablet"
>>
>>
>> The HV's networking looks as the following:
>> We are using dual emulex 10gbit network adapters, with bonding (LACP), and
>> on the top of the bond, we're using VLAN's for the VM, management and the
>> iSCSI traffic.
>> We're tried to reproduce the error, but we couldn't, the crash/reboot
>> happened randomly every time.
>>
>
> In that case you will need to instrument netback to spit out more
> information. Zoltan, is there any other information that you would like
> to know?
>
> Wei.
>
>> Thanks, for your help,
>>
>> - Armin Zentai
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2014-07-14 10:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-14 2:25 Trying to unmap invalid handle! pending_idx: @ drivers/net/xen-netback/netback.c:998 causes kernel panic/reboot Armin Zentai
2014-07-14 9:52 ` Wei Liu
2014-07-14 10:53 ` Armin Zentai [this message]
2014-07-14 11:15 ` Wei Liu
2014-07-14 11:54 ` Zoltan Kiss
2014-07-14 12:07 ` Zoltan Kiss
2014-07-14 12:27 ` Zoltan Kiss
2014-07-14 12:14 ` Armin Zentai
2014-07-14 15:30 ` Zoltan Kiss
2014-07-14 21:15 ` Armin Zentai
2014-07-15 9:32 ` Wei Liu
2014-07-17 19:17 ` Zoltan Kiss
2014-07-20 17:20 ` Armin Zentai
2014-07-21 10:24 ` Zoltan Kiss
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53C3B6BD.1050904@ezit.hu \
--to=armin.zentai@ezit.hu \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
--cc=zoltan.kiss@citrix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.