From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Christopher S. Aker" Subject: Re: netback Oops then xenwatch stuck in D state Date: Tue, 12 Feb 2013 21:51:53 -0500 Message-ID: <511AFFC9.3050404@theshore.net> References: <510C3AA3.2090508@theshore.net> <50E3A390-C52B-476A-8B20-BADBA42F3775@theshore.net> <51181924.4050500@theshore.net> <1360583103.16636.29.camel@zion.uk.xensource.com> <1360663133.20449.123.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1360663133.20449.123.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Wei Liu , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 2/12/13 4:58 AM, Ian Campbell wrote: > Have you applied the XSA-39 fixes to this kernel? Yes! When I rebuilt with Wei's suggested patch for my original netback/xenwatch problem I also brought us up to date with XSA patches. We just hit the same (new) problem on another machine, and looking at the BUG with more kernel output context gives a giant clue: Feb 12 20:30:54: IPv6: ADDRCONF(NETDEV_UP): vif21.0: link is not ready Feb 12 20:30:54: device vif21.0 entered promiscuous mode Feb 12 20:30:56: xen-blkback:ring-ref 8, event-channel 31, protocol 2 (x86_32-abi) Feb 12 20:30:56: xen-blkback:ring-ref 9, event-channel 32, protocol 2 (x86_32-abi) Feb 12 20:30:56: IPv6: ADDRCONF(NETDEV_CHANGE): vif21.0: link becomes ready Feb 12 20:30:56: br0: port 5(vif21.0) entered forwarding state Feb 12 20:30:56: br0: port 5(vif21.0) entered forwarding state Feb 12 20:30:58: br0: port 5(vif21.0) entered forwarding state Feb 12 20:34:12: vif vif-21-0 vif21.0: Frag is bigger than frame. Feb 12 20:34:12: vif vif-21-0 vif21.0: fatal error; disabling device <-------------- Feb 12 20:34:12: BUG: unable to handle kernel NULL pointer dereference at 00000000000008b8 Feb 12 20:34:12: IP: [] xen_spin_lock_flags+0x3a/0x80 Feb 12 20:34:12: PGD 0 Feb 12 20:34:12: Oops: 0002 [#1] SMP Feb 12 20:34:12: Modules linked in: ebt_comment ebt_arp ebt_set ebt_limit ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding ebtable_filter e1000e Feb 12 20:34:12: CPU 3 Feb 12 20:34:12: Pid: 1548, comm: netback/3 Not tainted 3.7.6-1-x86_64 #1 Supermicro X8DT6/X8DT6 Feb 12 20:34:12: RIP: e030:[] [] xen_spin_lock_flags+0x3a/0x80 Feb 12 20:34:12: RSP: e02b:ffff880083681b58 EFLAGS: 00010006 Feb 12 20:34:12: RAX: 0000000000000400 RBX: 00000000000008b8 RCX: 0000000000000663 Feb 12 20:34:12: RDX: 0000000000000001 RSI: 0000000000000210 RDI: 00000000000008b8 Feb 12 20:34:12: RBP: ffff880083681b78 R08: 000000000000000d R09: 0000000000000000 Feb 12 20:34:12: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001 Feb 12 20:34:12: R13: 0000000000000200 R14: 0000000000000400 R15: 0000000000000663 Feb 12 20:34:12: FS: 00007f2bc1fb2700(0000) GS:ffff8801006c0000(0000) knlGS:0000000000000000 Feb 12 20:34:12: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 12 20:34:12: CR2: 00000000000008b8 CR3: 0000000001c0b000 CR4: 0000000000002660 Feb 12 20:34:12: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 12 20:34:12: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 12 20:34:12: Process netback/3 (pid: 1548, threadinfo ffff880083680000, task ffff8800837ec9c0) Feb 12 20:34:12: Stack: Feb 12 20:34:12: 0000000000000210 00000000000008b8 ffff880003baa700 ffff880003baa7d8 Feb 12 20:34:12: ffff880083681b98 ffffffff817605da 0000000000000000 00000000000008b8 Feb 12 20:34:12: ffff880083681bd8 ffffffff8154446f ffff880003baa000 0000000000000000 Feb 12 20:34:12: Call Trace: Feb 12 20:34:12: [] _raw_spin_lock_irqsave+0x2a/0x40 Feb 12 20:34:12: [] xen_netbk_schedule_xenvif+0x8f/0x100 Feb 12 20:34:12: [] xen_netbk_check_rx_xenvif+0x25/0x60 Feb 12 20:34:12: [] netbk_tx_err+0x5b/0x70 Feb 12 20:34:12: [] xen_netbk_tx_build_gops+0xb8c/0xbc0 Feb 12 20:34:12: [] ? __switch_to+0x160/0x4f0 Feb 12 20:34:12: [] ? idle_balance+0xf8/0x150 Feb 12 20:34:12: [] ? finish_task_switch+0x60/0xd0 Feb 12 20:34:12: [] ? __schedule+0x394/0x750 Feb 12 20:34:12: [] xen_netbk_kthread+0xef/0x9d0 Feb 12 20:34:12: [] ? finish_task_switch+0x60/0xd0 Feb 12 20:34:12: [] ? wake_up_bit+0x40/0x40 Feb 12 20:34:12: [] ? xen_netbk_tx_build_gops+0xbc0/0xbc0 Feb 12 20:34:12: [] kthread+0xc6/0xd0 Feb 12 20:34:12: [] ? xen_end_context_switch+0x19/0x20 Feb 12 20:34:12: [] ? kthread_freezable_should_stop+0x70/0x70 Feb 12 20:34:12: [] ret_from_fork+0x7c/0xb0 Feb 12 20:34:12: [] ? kthread_freezable_should_stop+0x70/0x70 Feb 12 20:34:12: Code: 24 08 4c 89 6c 24 10 4c 89 74 24 18 49 89 f5 48 89 fb 41 81 e5 00 02 00 00 41 bc 01 00 00 00 41 be 00 04 00 00 44 89 f0 44 89 e2 <86> 13 84 d2 74 0b f3 90 80 3b 00 74 f3 ff c8 75 f5 84 d2 75 15 Feb 12 20:34:12: RIP [] xen_spin_lock_flags+0x3a/0x80 Feb 12 20:34:12: RSP Feb 12 20:34:12: CR2: 00000000000008b8 Feb 12 20:34:12: ---[ end trace ae243211c8c8cba5 ]--- https://lkml.org/lkml/2013/2/12/575 - "xen/netback: shut down the ring if it contains garbage" -Chris