netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NULL deref in bnx2 / crashes ? ( was: netconsole leads to stalled CPU task )
@ 2012-08-22 10:53 Sylvain Munaut
  2012-08-22 11:13 ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Sylvain Munaut @ 2012-08-22 10:53 UTC (permalink / raw)
  To: netdev

Hi again, a bit more detail:

> I'm trying to use the netconsole to feed kernel message to the outside
> but this lead to a stall ...
>
> This only happens in a fairly specific configuration where you have a
> bridge over vlan over bonding.
> I tested with only (bridge over vlan) and (vlan over bonding) and
> those work fine.
>
> [snip ... see original mail for all details]

I was previously testing under Xen.

For this round of test, I tried the kernel natively. And I also
included Dave Miller pending series ( e0e3cea4... ) since there was
patch related to netconsole and bridging / ...
So in the end, it's a 3.6-rc2 + Dave Miller tree (commit  e0e3cea4 ) +
pf malloc patch  + ip pmtu patch from Eric Dumazet.

I am now seeing more debug when I load netconsole in that config:

[   88.705138] netpoll: netconsole: local port 8888
[   88.705140] netpoll: netconsole: local IP 10.208.1.30
[   88.705141] netpoll: netconsole: interface 'mgmt'
[   88.705142] netpoll: netconsole: remote port 8000
[   88.705143] netpoll: netconsole: remote IP 10.208.1.3
[   88.705144] netpoll: netconsole: remote ethernet address 00:16:3e:1a:37:37
[   88.705469] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000008
[   88.705475] IP: [<ffffffffa0006653>] bnx2_start_xmit+0x20b/0x539 [bnx2]
[   88.705476] PGD 0
[   88.705478] Oops: 0002 [#1] PREEMPT SMP
[   88.705509] Modules linked in: netconsole(+) configfs nfsd
auth_rpcgss nfs_acl nfs lockd fscache sunrpc bridge 8021q garp stp llc
bonding ext2 iTCO_wdt iTCO_vendor_support lpc_ich mfd_core coretemp
joydev kvm evdev crc32c_intel ghash_clmulni_intel aesni_intel
aes_x86_64 aes_generic acpi_power_meter psmouse serio_raw dcdbas
processor ablk_helper i7core_edac pcspkr cryptd edac_core microcode
button hid_generic ext4 crc16 jbd2 mbcache dm_mod raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor xor async_tx
raid6_pq raid1 raid0 multipath linear md_mod sr_mod usbhid cdrom hid
ses sd_mod enclosure crc_t10dif usb_storage ata_generic pata_acpi uas
uhci_hcd megaraid_sas ata_piix ehci_hcd libata usbcore scsi_mod
usb_common bnx2
[   88.705511] CPU 2
[   88.705512] Pid: 3017, comm: modprobe Not tainted
3.6.0-rc2-00092-g9040592-dirty #6 Dell Inc. PowerEdge R610/0F0XJ6
[   88.705515] RIP: 0010:[<ffffffffa0006653>]  [<ffffffffa0006653>]
bnx2_start_xmit+0x20b/0x539 [bnx2]
[   88.705516] RSP: 0018:ffff88061e8fda28  EFLAGS: 00010002
[   88.705517] RAX: 0000000000000000 RBX: ffff8803200f2300 RCX: 0000000000000000
[   88.705519] RDX: 0000000320a95c02 RSI: 0000000000000003 RDI: ffff8800cb36f000
[   88.705519] RBP: ffff88031f814000 R08: 0000000000000054 R09: 0000000000000000
[   88.705520] R10: 000000000000ffff R11: 0000000000000000 R12: ffff8803215d52c0
[   88.705521] R13: ffff8803210e13c0 R14: 0000000000010008 R15: 0000000000000000
[   88.705522] FS:  00007fe9d0854700(0000) GS:ffff88062fc20000(0000)
knlGS:0000000000000000
[   88.705523] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   88.705524] CR2: 0000000000000008 CR3: 0000000619ccb000 CR4: 00000000000007e0
[   88.705525] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   88.705526] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   88.705528] Process modprobe (pid: 3017, threadinfo
ffff88061e8fc000, task ffff8806205e8000)
[   88.705528] Stack:
[   88.705530]  ffff88062ffecd80 0000000320a95c02 0000000000000054
ffffffff00000000
[   88.705532]  0000000000000041 ffff8803215d55f8 ffff88031f8167d8
ffffffff00000000
[   88.705534]  0000000000000000 0000000100000000 ffff88062ffedb08
ffff8803200f2300
[   88.705534] Call Trace:
[   88.705542]  [<ffffffff81280a76>] ? netpoll_send_skb_on_dev+0x201/0x31d
[   88.705546]  [<ffffffffa007fc4c>] ? bond_dev_queue_xmit+0x62/0x7f [bonding]
[   88.705549]  [<ffffffffa0084588>] ? bond_3ad_xmit_xor+0xe7/0x10c [bonding]
[   88.705552]  [<ffffffffa007fffd>] ? bond_start_xmit+0x394/0x3ff [bonding]
[   88.705554]  [<ffffffff81280a76>] ? netpoll_send_skb_on_dev+0x201/0x31d
[   88.705558]  [<ffffffffa004afd5>] ?
vlan_dev_hard_start_xmit+0xab/0xf6 [8021q]
[   88.705559]  [<ffffffff81280a76>] ? netpoll_send_skb_on_dev+0x201/0x31d
[   88.705564]  [<ffffffffa00938e8>] ? __br_deliver+0x93/0xbe [bridge]
[   88.705567]  [<ffffffffa009237d>] ? br_dev_xmit+0x14a/0x16b [bridge]
[   88.705569]  [<ffffffff81280a76>] ? netpoll_send_skb_on_dev+0x201/0x31d
[   88.705570]  [<ffffffff81280372>] ? find_skb.isra.23+0x31/0x78
[   88.705572]  [<ffffffff81280bbe>] ? netpoll_send_skb+0x2c/0x39
[   88.705574]  [<ffffffffa00a222a>] ? write_msg+0x98/0xf3 [netconsole]
[   88.705579]  [<ffffffff81037db2>] ?
call_console_drivers.constprop.17+0x6e/0x7d
[   88.705580]  [<ffffffff81038248>] ? console_unlock+0x2ab/0x351
[   88.705582]  [<ffffffff81039112>] ? register_console+0x273/0x303
[   88.705584]  [<ffffffffa00fa182>] ? init_netconsole+0x182/0x210 [netconsole]
[   88.705586]  [<ffffffffa00fa000>] ? 0xffffffffa00f9fff
[   88.705588]  [<ffffffff81002085>] ? do_one_initcall+0x75/0x12c
[   88.705590]  [<ffffffff81077b35>] ? sys_init_module+0x80/0x1c5
[   88.705593]  [<ffffffff813319b9>] ? system_call_fastpath+0x16/0x1b
[   88.705606] Code: 41 c1 e1 10 48 89 d6 48 6b c8 18 48 c1 e0 04 48
c1 ee 20 49 03 8c 24 50 03 00 00 45 09 c8 44 89 4c 24 38 c7 44 24 24
00 00 00 00 <48> 89 51 08 48 89 19 49 03 84 24 48 03 00 00 89 50 04 44
89 f2
[   88.705608] RIP  [<ffffffffa0006653>] bnx2_start_xmit+0x20b/0x539 [bnx2]
[   88.705609]  RSP <ffff88061e8fda28>
[   88.705609] CR2: 0000000000000008
[   88.705611] ---[ end trace 24b75fe520341c20 ]---
[   88.705985] note: modprobe[3017] exited with preempt_count 6
[   88.706135] Dead loop on virtual device mgmt, fix it urgently!
[   88.706201] Dead loop on virtual device mgmt, fix it urgently!
[  148.557967] INFO: rcu_preempt detected stalls on CPUs/tasks: {}
(detected by 0, t=60002 jiffies)
[  148.557967] INFO: Stall ended before state dump start
[  328.112761] INFO: rcu_preempt detected stalls on CPUs/tasks: {}
(detected by 2, t=240007 jiffies)
[  328.112761] INFO: Stall ended before state dump start


And when trying on another machine that has Intel network cards, it
just completely freezes the machine ... nothing even gets printed on
the screen or anywhere I can see.

Also note that this also doesn't work in 3.5.1 so it's not a new
behavior. 3.2.x don't support netconsole over vlan at all so can't
test on it.

Cheers,

    Sylvain Munaut

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-09-17 15:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-22 10:53 NULL deref in bnx2 / crashes ? ( was: netconsole leads to stalled CPU task ) Sylvain Munaut
2012-08-22 11:13 ` Eric Dumazet
2012-08-22 12:17   ` Sylvain Munaut
2012-08-22 13:05     ` Eric Dumazet
2012-08-22 14:29       ` Sylvain Munaut
2012-08-22 15:40         ` Cong Wang
2012-08-23  7:57         ` Cong Wang
2012-08-23  8:31           ` Cong Wang
2012-08-23  9:12             ` Cong Wang
2012-08-24  9:50               ` Sylvain Munaut
2012-08-25  8:01                 ` Cong Wang
2012-08-25  2:20         ` Lin Ming
2012-09-12 11:53       ` Sylvain Munaut
2012-09-12 12:49         ` Cong Wang
2012-09-12 13:05           ` Eric Dumazet
2012-09-13 17:35             ` Sylvain Munaut
2012-09-14 13:22               ` Cong Wang
2012-09-14 15:36                 ` Sylvain Munaut
2012-09-17 10:57                   ` Sylvain Munaut
2012-09-17 15:17                     ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).