* Repeatable kernel splat in 3.3.8+, related to ip_rcv_finish
@ 2013-01-16 0:33 Ben Greear
2013-01-16 2:00 ` Ben Greear
0 siblings, 1 reply; 2+ messages in thread
From: Ben Greear @ 2013-01-16 0:33 UTC (permalink / raw)
To: netdev
I have a reproducible crash (5-15 minutes typically)
in a hacked 3.3.8+ kernel. No proprietary modules, but my normal mix of
networking patches are applied, so it could be my fault.
The crash always appears to be in the ip_rcv_finish method.
On a pervious panic, the 'IP' was 0xFFFFFFFF, and I've seen
it be other similar values before. I'm not sure what this
implies....
Test case is 2000 mac-vlans, resetting about 50 of them at a time
during bringup, while also driving lower-speed NFS traffic on a
mount for each of the already-reset mac-vlans. It takes multiple
minutes for my app to fully reset all of the interfaces and start
traffic, and there are lots and lots of network changes going on
at or just previous to the crashes.
I need the nfs-bind-to-local-ip patches that I carry in my tree to
reproduce this, so I can't run this test on upstream kernels.
I will work on porting these into the 3.7 kernel for testing there
in the meantime.
Does this bug look familiar to anyone?
(gdb) l *(ip_rcv_finish+0x2ea)
0xffffffff814423e6 is in ip_rcv_finish (/home/greearb/git/linux-3.3.dev.y/net/ipv4/ip_input.c:365).
360 skb->len);
361 } else if (rt->rt_type == RTN_BROADCAST)
362 IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST,
363 skb->len);
364
365 return dst_input(skb);
366
367 drop:
368 kfree_skb(skb);
369 return NET_RX_DROP;
(gdb)
BUG: unable to handle kernel paging request at 0000001d00088000
IP: [<0000001d00088000>] 0x1d00087fff
PGD 372cb6067 PUD 0
Oops: 0010 [#1] PREEMPT SMP
CPU 11
Modules linked in: nfs nfs_acl auth_rpcgss fscache 8021q garp stp llc lockd sunrpc macvlan pktgen microcode pcspkr i2c_i801 i2c_core i7core_edac e1000e iTCO_wdt
iTCO_vendor_support ioatdma igb dca edac_core uinput ipv6 [last unloaded: scsi_wait_scan]
Pid: 67, comm: ksoftirqd/11 Tainted: G O 3.3.8+ #55 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<0000001d00088000>] [<0000001d00088000>] 0x1d00087fff
RSP: 0018:ffff88040974dc78 EFLAGS: 00010286
RAX: ffff88038526e500 RBX: ffff88038526eb00 RCX: ffff88038526eb00
RDX: 0000000000000020 RSI: 0000000000000002 RDI: ffff88038526eb00
RBP: ffff88040974dca0 R08: ffffffff814420fc R09: 0000000000000000
R10: ffff88040974d710 R11: ffffffff80000000 R12: ffff880357f4fcfc
R13: ffff880409295000 R14: 0000000000000000 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88041fd60000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000001d00088000 CR3: 00000003958bd000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/11 (pid: 67, threadinfo ffff88040974c000, task ffff880409741740)
Stack:
ffffffff814423e6 ffff88038526eb00 ffffffff814420fc ffff880409295000
0000000000000000 ffff88040974dcd0 ffffffff8144274e ffff880480000000
ffff880409741740 ffff88038526eb00 ffff880409295000 ffff88040974dd00
Call Trace:
[<ffffffff814423e6>] ? ip_rcv_finish+0x2ea/0x302
[<ffffffff814420fc>] ? inet_del_protocol+0x37/0x37
eth3#739: no IPv6 routers present
[<ffffffff8144274e>] NF_HOOK.clone.1+0x4c/0x53
[<ffffffff814429d9>] ip_rcv+0x237/0x262
[<ffffffff8140ea4f>] __netif_receive_skb+0x477/0x4c0
[<ffffffff8140eb8c>] process_backlog+0xf4/0x1d6
[<ffffffff81410ae8>] net_rx_action+0xad/0x1e9
[<ffffffff8105b105>] __do_softirq+0x86/0x12f
[<ffffffff8105b261>] run_ksoftirqd+0xb3/0x1a6
[<ffffffff8105b1ae>] ? __do_softirq+0x12f/0x12f
[<ffffffff8105b1ae>] ? __do_softirq+0x12f/0x12f
[<ffffffff8106da7d>] kthread+0x84/0x8c
[<ffffffff814cc4e4>] kernel_thread_helper+0x4/0x10
[<ffffffff8106d9f9>] ? __init_kthread_worker+0x37/0x37
[<ffffffff814cc4e0>] ? gs_change+0x13/0x13
Code: Bad RIP value.
RIP [<0000001d00088000>] 0x1d00087fff
RSP <ffff88040974dc78>
CR2: 0000001d00088000
---[ end trace 1d145cfe9c5c5d55 ]---
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Repeatable kernel splat in 3.3.8+, related to ip_rcv_finish
2013-01-16 0:33 Repeatable kernel splat in 3.3.8+, related to ip_rcv_finish Ben Greear
@ 2013-01-16 2:00 ` Ben Greear
0 siblings, 0 replies; 2+ messages in thread
From: Ben Greear @ 2013-01-16 2:00 UTC (permalink / raw)
To: netdev
On 01/15/2013 04:33 PM, Ben Greear wrote:
> I have a reproducible crash (5-15 minutes typically)
> in a hacked 3.3.8+ kernel. No proprietary modules, but my normal mix of
> networking patches are applied, so it could be my fault.
I see something similar in 3.7.2+.
But, still with my patches applied. I'll work on a minimal patch set
tomorrow to just fix up NFS how I need it and see if it's still reproducible...
(gdb) l *(ip_rcv_finish+0x2b7)
0xffffffff8149c933 is in ip_rcv_finish (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:373).
368 skb->len);
369 } else if (rt->rt_type == RTN_BROADCAST)
370 IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST,
371 skb->len);
372
373 return dst_input(skb);
374
375 drop:
376 kfree_skb(skb);
377 return NET_RX_DROP;
(gdb)
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
PGD 0
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: nfnetlink_log nfnetlink bluetooth nfsv4 auth_rpcgss nfs fscache 8021q garp stp llc loe
CPU 10
Pid: 70, comm: rcuc/10 Tainted: G WC O 3.7.2+ #25 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<0000000000000000>] [< (null)>] (null)
RSP: 0018:ffff88041fd43d90 EFLAGS: 00010286
RAX: ffff8803527eaf00 RBX: ffff8803e1e47100 RCX: ffff8803e1e47100
RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffff8803e1e47100
RBP: ffff88041fd43db8 R08: ffffffff8149c67c R09: ffff88041fd43d80
R10: ffffffff81a6f280 R11: ffff8803fff62940 R12: ffff8803dfa5f8fc
R13: ffff8803e1e47100 R14: ffff88040d38e000 R15: 0000000000000008
FS: 0000000000000000(0000) GS:ffff88041fd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rcuc/10 (pid: 70, threadinfo ffff88040d018000, task ffff88040d7fdcc0)
Stack:
ffffffff8149c933 ffff8803e1e47100 ffffffff8149c67c ffff8803e1e47100
ffff88040d38e000 ffff88041fd43de8 ffffffff8149cc98 0000000080000000
ffff88040d4e3e38 ffff8803e1e47100 ffff88040d38e000 ffff88041fd43e18
Call Trace:
<IRQ>
[<ffffffff8149c933>] ? ip_rcv_finish+0x2b7/0x2cf
[<ffffffff8149c67c>] ? inet_del_protocol+0x37/0x37
[<ffffffff8149cc98>] NF_HOOK.clone.1+0x4c/0x53
[<ffffffff8149cf23>] ip_rcv+0x237/0x268
[<ffffffff81468c29>] __netif_receive_skb+0x4da/0x583
[<ffffffff810a7535>] ? __wake_up_common+0x45/0x77
[<ffffffff81468dc8>] process_backlog+0xf6/0x1d7
[<ffffffff8146b17e>] net_rx_action+0xad/0x20c
[<ffffffff8108d292>] __do_softirq+0x9c/0x161
[<ffffffff8152aa1c>] call_softirq+0x1c/0x30
<EOI>
[<ffffffff8100bd21>] do_softirq+0x41/0x7e
[<ffffffff8108d413>] _local_bh_enable_ip+0x7a/0x9f
[<ffffffff8108d450>] local_bh_enable+0xd/0x11
[<ffffffff810ef54e>] rcu_cpu_kthread+0xe6/0x11f
[<ffffffff810a6f3b>] smpboot_thread_fn+0x253/0x258
[<ffffffff810a6ce8>] ? test_ti_thread_flag.clone.0+0x11/0x11
[<ffffffff8109ff60>] kthread+0xbf/0xc7
[<ffffffff8109fea1>] ? __init_kthread_worker+0x37/0x37
[<ffffffff8152977c>] ret_from_fork+0x7c/0xb0
[<ffffffff8109fea1>] ? __init_kthread_worker+0x37/0x37
Code: Bad RIP value.
RIP [< (null)>] (null)
RSP <ffff88041fd43d90>
CR2: 0000000000000000
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
PGD 0
Oops: 0010 [#2] PREEMPT SMP
Modules linked in: nfnetlink_log nfnetlink bluetooth nfsv4 auth_rpcgss nfs fscache 8021q garp stp llc loe
CPU 10
Pid: 72, comm: migration/10 Tainted: G WC O 3.7.2+ #25 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<0000000000000000>] [< (null)>] (null)
RSP: 0018:ffff88040d01d990 EFLAGS: 00010286
RAX: ffff8803527eaf00 RBX: ffff8803e1e46f00 RCX: ffff8803e1e46f00
RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffff8803e1e46f00
RBP: ffff88040d01d9b8 R08: ffffffff8149c67c R09: ffff88040d01d980
R10: dead000000200200 R11: dead000000100100 R12: ffff8803dfa5acfc
R13: ffff8803e1e46f00 R14: ffff88040d38e000 R15: 0000000000000008
FS: 0000000000000000(0000) GS:ffff88041fd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000007e0
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-01-16 2:00 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-16 0:33 Repeatable kernel splat in 3.3.8+, related to ip_rcv_finish Ben Greear
2013-01-16 2:00 ` Ben Greear
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).