From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Ed L. Cashin" Subject: 2.6.16.11 BUG at tg3.c:2917 Date: Thu, 27 Apr 2006 12:52:34 -0400 Message-ID: <20060427165234.GC29045@coraid.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "David S. Miller" Return-path: Received: from ns1.coraid.com ([65.14.39.133]:62805 "EHLO coraid.com") by vger.kernel.org with ESMTP id S965158AbWD0RKm (ORCPT ); Thu, 27 Apr 2006 13:10:42 -0400 To: netdev@vger.kernel.org Content-Disposition: inline Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Hi. On 2.6.15.7 and 2.6.16.11, I have seen panics under heavy NFS write load on an x86_64 system with two onboard Broadcom gigabit NICs. It's a Supermicro P8SCi motherboard with an EMT64 Intel CPU. The aoe driver in use is the aoe6-26 driver from the Coraid website. I haven't yet trimmed down the test case or tried using the aoe driver that comes with 2.6.16.11. Right now there's kernel NFS exporting an XFS filesystem on a logical volume backed by 3 AoE devices. I'm including two panics here. There's a relevant-looking discussion of the same bug from May 2005 at the URL below. http://oss.sgi.com/projects/netdev/archive/2004-05/msg00378.html ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at drivers/net/tg3.c:2917 invalid opcode: 0000 [1] SMP CPU 0 Modules linked in: nfsd lockd nfs_acl sunrpc xfs exportfs dm_mod aoe ipv6 rtc piix i2c_i801 psmouse evdev i2c_core unix Pid: 3053, comm: nfsd Not tainted 2.6.16.11-c1 #1 RIP: 0010:[] {tg3_poll+179} RSP: 0000:ffffffff8039cc38 EFLAGS: 00010246 RAX: 00000000000001fb RBX: 0000000000000000 RCX: 0000000000000003 RDX: 0000000000000038 RSI: ffff81003f03f180 RDI: ffff810001fbb980 RBP: ffff81003d82df88 R08: 0000000000000400 R09: ffff81003e5fae18 R10: ffff81003ee86a80 R11: 00000000000000c4 R12: ffff81003f0d0500 R13: 00000000000001fb R14: 0000000000000016 R15: ffff810023088c30 FS: 00002b4cde2ee6d0(0000) GS:ffffffff803e6000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000438010 CR3: 0000000025729000 CR4: 00000000000006e0 Process nfsd (pid: 3053, threadinfo ffff81003d6fc000, task ffff81003f304140) Stack: 0000000000000046 ffffffff802427b4 ffffffff8039ccd4 ffff81003f0d0000 ffff81003dfec000 000000140000002c 00000000000000ca 00ca8100000000ca ffff81003dfdd920 ffff81003f0d059c Call Trace: {task_in_intr+240} {net_rx_action+165} {__do_softirq+86} {call_softirq+30} {do_softirq+44} {local_bh_enable+105} {dev_queue_xmit+551} {:aoe:aoenet_xmit+26} {:aoe:aoeblk_make_request+413} {generic_make_request+335} {:dm_mod:__map_bio+66} {:dm_mod:__split_bio+365} {:xfs:linvfs_get_block+0} {:dm_mod:dm_request+262} {generic_make_request+335} {submit_bio+184} {:xfs:xfs_buf_iorequest+828} {default_wake_function+0} {:xfs:xfs_buf_associate_memory+117} {:xfs:xlog_bdstrat_cb+22} {:xfs:xlog_state_release_iclog+695} {:xfs:xlog_write+1509} {:xfs:xfs_log_write+42} {:xfs:_xfs_trans_commit+1294} {:xfs:kmem_zone_alloc+73} {:xfs:kmem_zone_zalloc+28} {:xfs:xfs_itruncate_finish+530} {:xfs:xfs_inactive_free_eofblocks+384} {:xfs:linvfs_release+0} {:xfs:xfs_release+152} {:xfs:linvfs_release+23} {__fput+155} {:nfsd:nfsd_write+196} {:nfsd:nfsd3_proc_write+231} {:nfsd:nfsd_dispatch+221} {:sunrpc:svc_process+975} {__down_read+18} {:nfsd:nfsd+451} {child_rip+8} {:nfsd:nfsd+0} {child_rip+0} Code: 0f 0b 68 83 5f 2f 80 c2 65 0b 49 8b 44 24 40 8b 93 88 00 00 RIP {tg3_poll+179} RSP <0>Kernel panic - not syncing: Aiee, killing interrupt handler! ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at drivers/net/tg3.c:2914 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: nfsd lockd nfs_acl sunrpc dm_mod aoe xfs exportfs ipv6 i2c_i801 i2c_core piix md_mod rtc psmouse unix Pid: 88, comm: kswapd0 Not tainted 2.6.15.7-c1 #1 RIP: 0010:[] {tg3_poll+179} RSP: 0000:ffffffff80395e08 EFLAGS: 00010246 RAX: 0000000000000066 RBX: 0000000000000000 RCX: 0000000000000002 RDX: 0000000000000028 RSI: ffff81003e999d80 RDI: ffff810001fbba40 RBP: ffff81003dd63990 R08: ffffffff80395ea8 R09: ffff81003dc2ce18 R10: 000000000000003a R11: ffffffff80395ea8 R12: ffff81003f1a3500 R13: 0000000000000066 R14: 00000000000000a9 R15: ffffffff80395f08 FS: 0000000000000000(0000) GS:ffffffff803e1800(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000004a12a7 CR3: 00000000077ab000 CR4: 00000000000006e0 Process kswapd0 (pid: 88, threadinfo ffff81003f5d8000, task ffff81003f594790) Stack: ffffffff803c8980 0000000000001d4c ffffffff80395ea4 ffff81003f1a3000 ffff81003db45000 0000004000000000 0000000000000049 004900000000003b ffff81003e52c740 ffff81003f1a359c Call Trace: {net_rx_action+165} {__do_softirq+86} {call_softirq+31} {do_softirq+44} {do_IRQ+52} {ret_from_intr+0} {cache_flusharray+30} {:xfs:linvfs_release_page+0} {_write_unlock_irqrestore+9} {test_clear_page_dirty+152} {try_to_free_buffers+116} {:xfs:linvfs_release_page+0} {:xfs:linvfs_release_page+133} {shrink_zone+2695} {activate_task+140} {try_to_wake_up+1110} {balance_pgdat+535} {kswapd+256} {autoremove_wake_function+0} {child_rip+8} {kswapd+0} {child_rip+0} Code: 0f 0b 68 ba 2d 2f 80 c2 62 0b 49 8b 44 24 40 8b 93 80 00 00 RIP {tg3_poll+179} RSP <0>Kernel panic - not syncing: Aiee, killing interrupt handler! -- Ed L Cashin