From mboxrd@z Thu Jan 1 00:00:00 1970 From: Armin Zentai Subject: Re: [BUG] Centos 7 shutdown can cause dom0 kernel panic Date: Wed, 13 May 2015 17:16:04 +0200 Message-ID: <55536AB4.2040000@ezit.hu> References: <5553134C.7050005@ezit.hu> <1431514736.8263.245.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1431514736.8263.245.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 13/05/15 12:58, Ian Campbell wrote: > On Wed, 2015-05-13 at 11:03 +0200, Armin Zentai wrote: >> Dear Xen Developers! >> >> >> I'd like to report you a bug, that can cause HV reboot. > > Thanks. > > FWIW this is a dom0 kernel panic somewhere in either the network or > iscsi stack, not a panic in the hypervisor itself. > > I mention this because the set of people who would be expected to look > into such a thing would be different. I've also tweaked the subject to > reflect this. > > Ian. Thanks! Maybe some useful extra info: I was able to reproduce the problem on a different hardware, with the same OS, XEN, VM and settings. (with the same call trace output) Other hardware: NIC: Emulex Corporation OneConnect 10Gb iSCSI Initiator (rev 02) Motherboard: Dell Inc. PowerEdge R610 CPU: X5650 > >> >> Initiating an xm shutdown command to a centos 7 can VM cause a >> Hypervisor kernel panic. >> >> Fortunately I can reproduce this bug for every second shutdown >> (statistically), this HV was pulled out from production environment, so >> I can test anything on it from now. >> >> Running: xm shutdown p39glp9m68muq2 >> after some seconds, HV kernel panic >> >> >> >> Dom0 Operating system: >> Linux c2-node08 3.10.55-11.el6.centos.alt.x86_64 #1 SMP Fri Sep 26 >> 19:08:24 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux >> >> DomU Operating system (guest centos7): >> Linux centos7memtest 3.10.0-123.20.1.el7.onapp.x86_64 #1 SMP Fri Feb 6 >> 14:54:22 EET 2015 x86_64 x86_64 x86_64 GNU/Linux >> >> >> HW: >> CPU: Intel Xeon E5645 @ 2.40 Ghz >> Chassis/Motherboard: Dell PowerEdge R410 >> Memory: 48GB (4x16GB HMT42GR7BMR4A-G7) >> Disk: INTEL SSDSA2CT04 - 40GB >> NIC: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01) >> >> We're using a SAN network, 2x2 multipath iscsi, with a EMC VNX 5300 >> storage. iSCSI is connected via 2x10 gbit, through two Cisco Nexus 5548 >> switches. >> >> >> Grub config: >> kernel /boot/xen.gz dom0_mem=3145728 dom0_max_vcpus=6 log_lvl=all >> guest_loglvl=all noreboot=true >> module /boot/vmlinuz-3.10.55-11.el6.centos.alt.x86_64 ro >> root=UUID=796d645a-c97a-4574-a4fc-716cbeb7247e rd_NO_LUKS rd_NO_LVM >> LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto >> KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM pcie_asmp=off >> module /boot/initramfs-3.10.55-11.el6.centos.alt.x86_64.img >> >> We are using these extra settings at startup >> ethtool -K tge1 tso off gso off lro off >> ethtool -K tge2 tso off gso off lro off >> >> sysctl -w vm.min_free_kbytes=262144 >> sysctl -w kernel.sem="250 32000 100 512" >> >> Dmesg output from dom0 is attached as dom0_dmesg.txt >> xl dmesg output from dom0 is attached as dom0_xl_dmesg.txt >> >> Instead of searial we are using netconsole, but it gives us a good >> output. Attached as netconsole_log.txt. But I'm putting here the top of >> the call stack: >> [] ? iscsi_tcp_recv_skb+0x1b8/0x3c0 [libiscsi_tcp] >> [] iscsi_sw_tcp_recv+0x49/0xe0 [iscsi_tcp] >> [] tcp_read_sock+0x95/0x1e0 >> [] ? local_bh_enable+0x27/0xa0 >> [] ? iscsi_sw_tcp_state_change+0xd0/0xd0 [iscsi_tcp] >> [] iscsi_sw_tcp_data_ready+0x57/0xd8 [iscsi_tcp] >> [] ? tcp_try_rmem_schedule+0x6d/0x130 >> [] tcp_data_queue+0x37f/0x5c0 >> [] tcp_rcv_established+0x319/0 >> >> If it's required I can configure a serial link to a HV, to catch a >> better output. >> >> Dmesg output from domU is attached as centos7_dmesg.txt >> >> Xen version: 4.2.4-33.el6, full xen info output is attached as xen_info.txt >> >> Probably it's related is iSCSI, so I'm putting here some info about the >> network and iSCSI settings: >> >> >> Interfaces: >> >> 4: tge1: mtu 9000 qdisc mq state UP >> qlen 1000 >> link/ether a0:36:9f:2a:70:90 brd ff:ff:ff:ff:ff:ff >> 5: tge2: mtu 9000 qdisc mq state UP >> qlen 1000 >> link/ether a0:36:9f:2a:70:92 brd ff:ff:ff:ff:ff:ff >> 6: tge1.20@tge1: mtu 1500 qdisc >> noqueue state UP >> link/ether a0:36:9f:2a:70:90 brd ff:ff:ff:ff:ff:ff >> 7: tge1.12@tge1: mtu 9000 qdisc >> noqueue state UP >> link/ether a0:36:9f:2a:70:90 brd ff:ff:ff:ff:ff:ff >> 8: tge1.4@tge1: mtu 9000 qdisc noqueue >> state UP >> link/ether a0:36:9f:2a:70:90 brd ff:ff:ff:ff:ff:ff >> 9: tge2.5@tge2: mtu 9000 qdisc noqueue >> state UP >> link/ether a0:36:9f:2a:70:92 brd ff:ff:ff:ff:ff:ff >> >> iSCSI traffic is sent via the tge1.4 and tge2.5 VLAN interfaces >> >> 8: tge1.4@tge1: mtu 9000 qdisc noqueue >> state UP >> link/ether a0:36:9f:2a:70:90 brd ff:ff:ff:ff:ff:ff >> inet 10.0.1.18/24 brd 10.0.1.255 scope global tge1.4 >> valid_lft forever preferred_lft forever >> inet6 fe80::a236:9fff:fe2a:7090/64 scope link >> valid_lft forever preferred_lft forever >> 9: tge2.5@tge2: mtu 9000 qdisc noqueue >> state UP >> link/ether a0:36:9f:2a:70:92 brd ff:ff:ff:ff:ff:ff >> inet 10.0.2.18/24 brd 10.0.2.255 scope global tge2.5 >> valid_lft forever preferred_lft forever >> inet6 fe80::a236:9fff:fe2a:7092/64 scope link >> valid_lft forever preferred_lft forever >> >> >> Multipathd is working with the following config: >> devices { >> device { >> vendor "DGC" >> product "*" >> product_blacklist "LUNZ" >> path_grouping_policy "group_by_prio" >> getuid_callout "/sbin/scsi_id -g -u -d /dev/%n" >> path_selector "round-robin 0" >> features "1 queue_if_no_path" >> hardware_handler "1 alua" >> prio "alua" >> path_checker "emc_clariion" >> no_path_retry 60 >> failback immediate >> rr_weight uniform >> rr_min_io 1000 >> } >> } >> >> >> >> iSCSI config: >> >> iscsid.startup = /etc/rc.d/init.d/iscsid force-start >> node.startup = automatic >> node.leading_login = No >> node.session.timeo.replacement_timeout = 10 >> node.conn[0].timeo.login_timeout = 5 >> node.conn[0].timeo.logout_timeout = 5 >> node.conn[0].timeo.noop_out_interval = 5 >> node.conn[0].timeo.noop_out_timeout = 5 >> node.session.err_timeo.abort_timeout = 10 >> node.session.err_timeo.lu_reset_timeout = 10 >> node.session.err_timeo.tgt_reset_timeout = 10 >> node.session.initial_login_retry_max = 8 >> node.session.cmds_max = 512 >> node.session.queue_depth = 32 >> node.session.xmit_thread_priority = -20 >> node.session.iscsi.InitialR2T = No >> node.session.iscsi.ImmediateData = Yes >> node.session.iscsi.FirstBurstLength = 262144 >> node.session.iscsi.MaxBurstLength = 16776192 >> node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 >> node.conn[0].iscsi.MaxXmitDataSegmentLength = 0 >> discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768 >> node.conn[0].iscsi.HeaderDigest = None >> node.session.nr_sessions = 1 >> node.session.iscsi.FastAbort = Yes >> >> >> >> XM config for the virtual machine: >> >> bootloader = "/usr/bin/pygrub" >> vcpus = "1" >> memory = "400" >> name = "p39glp9m68muq2" >> >> vif = [ "mac=00:16:3e:84:XX:XX, bridge=x0evss6g1ztoa4, ip=XX.XX.XX.XX, >> vifname=gh4txstv16yoaw, rate=100Mb/s" ] >> >> disk = [ "phy:/dev/onapp-p65uo6l3rgns6n/x5c58a6aj8cgiw,xvda1,w", >> "phy:/dev/onapp-p65uo6l3rgns6n/s89aa4a3m3uzi2,xvda2,w" ] >> vfb = [ "type=vnc,vncpasswd=lysuad" ] >> >> >> >> Thanks for all, >> - Armin >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > >