ixgbe NULLL pointer dereference on OOM condition, 2.6.31.7

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ixgbe NULLL pointer dereference on OOM condition, 2.6.31.7
@ 2010-01-04 19:57 Ben Greear
  2010-01-13  2:13 ` Brandeburg, Jesse
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2010-01-04 19:57 UTC (permalink / raw)
  To: NetDev

This is on a hacked 2.6.31.7 kernel.  I'm testing an application that creates
30,000+ TCP connections (to self).  The system is 64-bit with 12GB of RAM, but
it can still run out of usable RAM (say, when I start another 10k connections
to bring it up to 40k).

It looks like something in ixgbe isn't properly checking for inability
to allocate (or to have previously allocated) an skb, or perhaps some other
chunk of memory:

[root@ct503-10G-09 ~]# BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
IP: [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1/stat
CPU 6
Modules linked in: 8021q garp stp llc veth fuse arc4 michael_mic macvlan wanlink(P) pktgen sunrpc ipv6 dm_multipath uinput ixg]
Pid: 33, comm: events/6 Tainted: P           2.6.31.7 #11 X8STi
RIP: 0010:[<ffffffffa0054ca0>]  [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
RSP: 0018:ffff8800280e5dc0  EFLAGS: 00010287
RAX: ffff88030a838000 RBX: 0000000000000000 RCX: ffff88030a838000
RDX: ffff8800280e5e64 RSI: 0000000000000000 RDI: ffff88032e9330c0
RBP: ffff8800280e5e40 R08: 000000000000004e R09: ffff88032e933680
R10: ffff88033fc08000 R11: 0000000000000080 R12: ffffc90014dd9000
R13: ffff88033043d1e0 R14: ffff88032dcd05c0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8800280e2000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000e8 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process events/6 (pid: 33, threadinfo ffff880332594000, task ffff880332573720)
Stack:
  000000000000002b 000000402490b420 ffff8800280e5e64 ffff88032e9330c0
<0> ffff880332744800 ffff88030a838000 000000000000002b 0000006300000000
<0> 0000000000000000 0000000000000000 ffff8800280e5e30 ffff88033043d1e0
Call Trace:
  <IRQ>
  [<ffffffffa00575c8>] ixgbe_clean_rxonly+0x6b/0xbd [ixgbe]
  [<ffffffff8135aa1e>] net_rx_action+0xd0/0x248
  [<ffffffffa0054569>] ? napi_schedule+0x1d/0x22 [ixgbe]
  [<ffffffff81056772>] __do_softirq+0x10e/0x21a
  [<ffffffff81096912>] ? handle_IRQ_event+0xa2/0x1a5
  [<ffffffff81012ddc>] call_softirq+0x1c/0x30
  [<ffffffff81014533>] do_softirq+0x42/0x8b
  [<ffffffff810565b2>] irq_exit+0x3f/0x94
  [<ffffffff810142cd>] do_IRQ+0x94/0xab
  [<ffffffff810125d3>] ret_from_intr+0x0/0x11
  <EOI>
  [<ffffffffa0059523>] ? ixgbe_disable_pcie_master+0x80/0xa9 [ixgbe]
  [<ffffffffa005ff89>] ? ixgbe_reset_hw_82599+0x63/0x19f [ixgbe]
  [<ffffffffa0059063>] ? ixgbe_init_hw_generic+0xf/0x1d [ixgbe]
  [<ffffffffa00530c3>] ? ixgbe_reset+0x1e/0xef [ixgbe]
  [<ffffffffa005657c>] ? ixgbe_down+0x1e6/0x24f [ixgbe]
  [<ffffffffa0057a13>] ? ixgbe_reset_task+0x0/0x24 [ixgbe]
  [<ffffffffa0057549>] ? ixgbe_reinit_locked+0x57/0x6b [ixgbe]
  [<ffffffffa0057a35>] ? ixgbe_reset_task+0x22/0x24 [ixgbe]
  [<ffffffff81063045>] ? worker_thread+0x19a/0x244
  [<ffffffff81066d5b>] ? autoremove_wake_function+0x0/0x38
  [<ffffffff81062eab>] ? worker_thread+0x0/0x244
  [<ffffffff81066ae4>] ? kthread+0x7b/0x83
  [<ffffffff81012cda>] ? child_rip+0xa/0x20
  [<ffffffff81066a69>] ? kthread+0x0/0x83
  [<ffffffff81012cd0>] ? child_rip+0x0/0x20
Code: f8 05 3d 00 01 00 00 44 0f 46 c0 48 8b 45 a8 44 0f b7 78 0c eb 0c 48 8b 55 a8 45 31 ff 44 0f b7 42 0c 49 8b 1c 24 49 8b
RIP  [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
  RSP <ffff8800280e5dc0>
CR2: 00000000000000e8
BUG: unable to handle kernel
---[ end trace db48be5c67f6f225 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 33, comm: events/6 Tainted: P      D    2.6.31.7 #11
Call Trace:
  <IRQ>  [<ffffffff81050a4b>] panic+0xaf/0x16e
  [<ffffffff810127a3>] ? apic_timer_interrupt+0x13/0x20
  [<ffffffff8105072e>] ? print_oops_end_marker+0x1e/0x20
  [<ffffffff813fb588>] oops_end+0xb1/0xc1
  [<ffffffff810339c6>] no_context+0x1ef/0x1fe
  [<ffffffff81033c0d>] __bad_area_nosemaphore+0x17e/0x1a1
  [<ffffffff81096912>] ? handle_IRQ_event+0xa2/0x1a5
  [<ffffffff81033ca6>] bad_area_nosemaphore+0xe/0x10
  [<ffffffff813fc9f2>] do_page_fault+0x157/0x275
  [<ffffffff813faac5>] page_fault+0x25/0x30
  [<ffffffffa0054ca0>] ? ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
  [<ffffffffa00575c8>] ixgbe_clean_rxonly+0x6b/0xbd [ixgbe]
  [<ffffffff8135aa1e>] net_rx_action+0xd0/0x248
  [<ffffffffa0054569>] ? napi_schedule+0x1d/0x22 [ixgbe]
  [<ffffffff81056772>] __do_softirq+0x10e/0x21a
  [<ffffffff81096912>] ? handle_IRQ_event+0xa2/0x1a5
  [<ffffffff81012ddc>] call_softirq+0x1c/0x30
  [<ffffffff81014533>] do_softirq+0x42/0x8b
  [<ffffffff810565b2>] irq_exit+0x3f/0x94
  [<ffffffff810142cd>] do_IRQ+0x94/0xab
  [<ffffffff810125d3>] ret_from_intr+0x0/0x11
  <EOI>  [<ffffffffa0059523>] ? ixgbe_disable_pcie_master+0x80/0xa9 [ixgbe]
  [<ffffffffa005ff89>] ? ixgbe_reset_hw_82599+0x63/0x19f [ixgbe]
  [<ffffffffa0059063>] ? ixgbe_init_hw_generic+0xf/0x1d [ixgbe]
  [<ffffffffa00530c3>] ? ixgbe_reset+0x1e/0xef [ixgbe]
  [<ffffffffa005657c>] ? ixgbe_down+0x1e6/0x24f [ixgbe]
  [<ffffffffa0057a13>] ? ixgbe_reset_task+0x0/0x24 [ixgbe]
  [<ffffffffa0057549>] ? ixgbe_reinit_locked+0x57/0x6b [ixgbe]
  [<ffffffffa0057a35>] ? ixgbe_reset_task+0x22/0x24 [ixgbe]
  [<ffffffff81063045>] ? worker_thread+0x19a/0x244
  [<ffffffff81066d5b>] ? autoremove_wake_function+0x0/0x38
  [<ffffffff81062eab>] ? worker_thread+0x0/0x244
  [<ffffffff81066ae4>] ? kthread+0x7b/0x83
  [<ffffffff81012cda>] ? child_rip+0xa/0x20
  [<ffffffff81066a69>] ? kthread+0x0/0x83
  [<ffffffff81012cd0>] ? child_rip+0x0/0x20
NULL pointer dereference at 00000000000000e8
IP: [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
PGD 0
Oops: 0000 [#2] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1/stat
CPU 2
Modules linked in: 8021q garp stp llc veth fuse arc4 michael_mic macvlan wanlink(P) pktgen sunrpc ipv6 dm_multipath uinput ixg]
Pid: 0, comm: swapper Tainted: P      D    2.6.31.7 #11 X8STi
RIP: 0010:[<ffffffffa0054ca0>]  [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
RSP: 0000:ffff880028071dc0  EFLAGS: 00010287
RAX: ffff88030a8c4000 RBX: 0000000000000000 RCX: ffff88030a8c4000
RDX: ffff880028071e64 RSI: 0000000000000000 RDI: ffff88032e933540
RBP: ffff880028071e40 R08: 000000000000004e R09: ffff88032e933430
R10: ffff88033fc08000 R11: ffff880028071f58 R12: ffffc90014de5000
R13: ffff88033043d240 R14: ffff88032dcd05c0 R15: 000000000000059c
FS:  0000000000000000(0000) GS:ffff88002806e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000e8 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8803324fc000, task ffff8803324c5e80)
Stack:
  0000000000000023 000000402490b3c0 ffff880028071e64 ffff88032e933540
<0> ffff880332744800 ffff88030a8c4000 0000000000000023 0000006300000000
<0> 0000000000000000 0000000000000000 ffff880028071e30 ffff88033043d240
Call Trace:
  <IRQ>
  [<ffffffffa00575c8>] ixgbe_clean_rxonly+0x6b/0xbd [ixgbe]
  [<ffffffff8135aa1e>] net_rx_action+0xd0/0x248
  [<ffffffffa0054569>] ? napi_schedule+0x1d/0x22 [ixgbe]
  [<ffffffff81056772>] __do_softirq+0x10e/0x21a
  [<ffffffff81096912>] ? handle_IRQ_event+0xa2/0x1a5
  [<ffffffff81012ddc>] call_softirq+0x1c/0x30
  [<ffffffff81014533>] do_softirq+0x42/0x8b
  [<ffffffff810565b2>] irq_exit+0x3f/0x94
  [<ffffffff810142cd>] do_IRQ+0x94/0xab
  [<ffffffff810125d3>] ret_from_intr+0x0/0x11
  <EOI>
  [<ffffffff81254bde>] ? acpi_idle_enter_simple+0xe6/0x117
  [<ffffffff81254bd7>] ? acpi_idle_enter_simple+0xdf/0x117
  [<ffffffff81254cdc>] ? acpi_idle_enter_bm+0xcd/0x251
  [<ffffffff813304bf>] ? cpuidle_idle_call+0x7c/0xb5
  [<ffffffff81010d6c>] ? cpu_idle+0x58/0xa8
  [<ffffffff813f48d4>] ? start_secondary+0x1a2/0x1a7
Code: f8 05 3d 00 01 00 00 44 0f 46 c0 48 8b 45 a8 44 0f b7 78 0c eb 0c 48 8b 55 a8 45 31 ff 44 0f b7 42 0c 49 8b 1c 24 49 8b
RIP  [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
  RSP <ffff880028071dc0>
CR2: 00000000000000e8
BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
IP: [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
PGD 0
Oops: 0000 [#3] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1/stat
CPU 1
Modules linked in: 8021q garp stp llc veth fuse arc4 michael_mic macvlan wanlink(P) pktgen sunrpc ipv6 dm_multipath uinput ixg]
Pid: 0, comm: swapper Tainted: P      D    2.6.31.7 #11 X8STi
RIP: 0010:[<ffffffffa0054ca0>]  [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
RSP: 0018:ffff880028054dc0  EFLAGS: 00010287
RAX: ffff88030a864000 RBX: 0000000000000000 RCX: ffff88030a864000
RDX: ffff880028054e64 RSI: 0000000000000000 RDI: ffff88032e933300
RBP: ffff880028054e40 R08: 000000000000004e R09: ffff88032e933070
R10: 0000000000000002 R11: ffffffff81387046 R12: ffffc90014df1000
R13: ffff88033043d2a0 R14: ffff88032dcd05c0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028051000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000e8 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8803324f0000, task ffff8803324c3f00)
Stack:
  ffffffff816988a0 000000402e933000 ffff880028054e64 ffff88032e933300
<0> ffff880332744800 ffff88030a864000 0000000028054e40 0000006300000000
<0> 0000000000000000 0000000000000000 ffff88032e933070 ffff88033043d2a0
Call Trace:
  <IRQ>
  [<ffffffffa00575c8>] ixgbe_clean_rxonly+0x6b/0xbd [ixgbe]
  [<ffffffff8135aa1e>] net_rx_action+0xd0/0x248
  [<ffffffffa0054569>] ? napi_schedule+0x1d/0x22 [ixgbe]
  [<ffffffff81056772>] __do_softirq+0x10e/0x21a
  [<ffffffff81096912>] ? handle_IRQ_event+0xa2/0x1a5
  [<ffffffff81012ddc>] call_softirq+0x1c/0x30
  [<ffffffff81014533>] do_softirq+0x42/0x8b
  [<ffffffff810565b2>] irq_exit+0x3f/0x94
  [<ffffffff810142cd>] do_IRQ+0x94/0xab
  [<ffffffff810125d3>] ret_from_intr+0x0/0x11
  <EOI>
  [<ffffffff81254bde>] ? acpi_idle_enter_simple+0xe6/0x117
  [<ffffffff81254bd7>] ? acpi_idle_enter_simple+0xdf/0x117
  [<ffffffff81254cdc>] ? acpi_idle_enter_bm+0xcd/0x251
  [<ffffffff813304bf>] ? cpuidle_idle_call+0x7c/0xb5
  [<ffffffff81010d6c>] ? cpu_idle+0x58/0xa8
  [<ffffffff813f48d4>] ? start_secondary+0x1a2/0x1a7
Code: f8 05 3d 00 01 00 00 44 0f 46 c0 48 8b 45 a8 44 0f b7 78 0c eb 0c 48 8b 55 a8 45 31 ff 44 0f b7 42 0c 49 8b 1c 24 49 8b
RIP  [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
  RSP <ffff880028054dc0>
CR2: 00000000000000e8

  CTRL-A Z for help | 38400 8N1 | NOR | Minicom 2.1    | VT102 | Online 00:17

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ixgbe NULLL pointer dereference on OOM condition, 2.6.31.7
  2010-01-04 19:57 ixgbe NULLL pointer dereference on OOM condition, 2.6.31.7 Ben Greear
@ 2010-01-13  2:13 ` Brandeburg, Jesse
  2010-01-13  5:28   ` Ben Greear
  2010-01-13 21:18   ` Ben Greear
  0 siblings, 2 replies; 4+ messages in thread
From: Brandeburg, Jesse @ 2010-01-13  2:13 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev, jesse.brandeburg

On Mon, 4 Jan 2010, Ben Greear wrote:

> This is on a hacked 2.6.31.7 kernel.  I'm testing an application that creates
> 30,000+ TCP connections (to self).  The system is 64-bit with 12GB of RAM, but
> it can still run out of usable RAM (say, when I start another 10k connections
> to bring it up to 40k).
> 
> It looks like something in ixgbe isn't properly checking for inability
> to allocate (or to have previously allocated) an skb, or perhaps some other
> chunk of memory:
> 
> [root@ct503-10G-09 ~]# BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
> IP: [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]

Hi Ben, thanks for the report, is there a chance you can run gdb on your 
kernel (was it compiled with debug info?) and check the
gdb> l *(ixgbe_clean_rx_irq+0xe4)

Sorry I'm so slow to respond.

it seems there are some unwind problems after the recent round of patches 
to remove skb_dma_map/unmap, but those were only introduced in 2.6.33-rc1.  
Before that we weren't aware of any failure path issues.

I'm building a kernel now to see if I can figure out the offset where 
you're showing the problem.

Jesse

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ixgbe NULLL pointer dereference on OOM condition, 2.6.31.7
  2010-01-13  2:13 ` Brandeburg, Jesse
@ 2010-01-13  5:28   ` Ben Greear
  2010-01-13 21:18   ` Ben Greear
  1 sibling, 0 replies; 4+ messages in thread
From: Ben Greear @ 2010-01-13  5:28 UTC (permalink / raw)
  To: Brandeburg, Jesse; +Cc: NetDev

On 01/12/2010 06:13 PM, Brandeburg, Jesse wrote:
> On Mon, 4 Jan 2010, Ben Greear wrote:
>
>> This is on a hacked 2.6.31.7 kernel.  I'm testing an application that creates
>> 30,000+ TCP connections (to self).  The system is 64-bit with 12GB of RAM, but
>> it can still run out of usable RAM (say, when I start another 10k connections
>> to bring it up to 40k).
>>
>> It looks like something in ixgbe isn't properly checking for inability
>> to allocate (or to have previously allocated) an skb, or perhaps some other
>> chunk of memory:
>>
>> [root@ct503-10G-09 ~]# BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
>> IP: [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
>
> Hi Ben, thanks for the report, is there a chance you can run gdb on your
> kernel (was it compiled with debug info?) and check the
> gdb>  l *(ixgbe_clean_rx_irq+0xe4)
>
> Sorry I'm so slow to respond.
>
> it seems there are some unwind problems after the recent round of patches
> to remove skb_dma_map/unmap, but those were only introduced in 2.6.33-rc1.
> Before that we weren't aware of any failure path issues.
>
> I'm building a kernel now to see if I can figure out the offset where
> you're showing the problem.

I don't have symbols in mine currently..but I can re-compile tomorrow and attempt to
reproduce it.

I have a few patches from Intel developers in the ixgbe driver, so it's not
stock anymore.

I don't expect you to want to download my tree, but just in case you do, it's
at:

git clone git://dmz1.candelatech.com/linux-2.6.dev.31.y

I was using the 64-bit config file found in the /configs dir

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ixgbe NULLL pointer dereference on OOM condition, 2.6.31.7
  2010-01-13  2:13 ` Brandeburg, Jesse
  2010-01-13  5:28   ` Ben Greear
@ 2010-01-13 21:18   ` Ben Greear
  1 sibling, 0 replies; 4+ messages in thread
From: Ben Greear @ 2010-01-13 21:18 UTC (permalink / raw)
  To: Brandeburg, Jesse; +Cc: NetDev

On 01/12/2010 06:13 PM, Brandeburg, Jesse wrote:
> On Mon, 4 Jan 2010, Ben Greear wrote:
>
>> This is on a hacked 2.6.31.7 kernel.  I'm testing an application that creates
>> 30,000+ TCP connections (to self).  The system is 64-bit with 12GB of RAM, but
>> it can still run out of usable RAM (say, when I start another 10k connections
>> to bring it up to 40k).
>>
>> It looks like something in ixgbe isn't properly checking for inability
>> to allocate (or to have previously allocated) an skb, or perhaps some other
>> chunk of memory:
>>
>> [root@ct503-10G-09 ~]# BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
>> IP: [<ffffffffa0054ca0>] ixgbe_clean_rx_irq+0xe4/0x522 [ixgbe]
>
> Hi Ben, thanks for the report, is there a chance you can run gdb on your
> kernel (was it compiled with debug info?) and check the
> gdb>  l *(ixgbe_clean_rx_irq+0xe4)
>
> Sorry I'm so slow to respond.
>
> it seems there are some unwind problems after the recent round of patches
> to remove skb_dma_map/unmap, but those were only introduced in 2.6.33-rc1.
> Before that we weren't aware of any failure path issues.
>
> I'm building a kernel now to see if I can figure out the offset where
> you're showing the problem.

I was able to reproduce this against 2.6.31.9 + hackings, but it took around
3 hours of 30k connections and intermittent serious memory pressure.

This appears to be the identical place from the previous crash (there were no
ixgbe changes between .7 and .9 as far as I know).

# BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
IP: [<ffffffffa005760b>] ixgbe_clean_rx_irq+0xdf/0x525 [ixgbe]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1/stat
CPU 6
Modules linked in: arc4 michael_mic wanlink nfs lockd fscache nfs_acl auth_rpcgss 8021q garp stp llc veth fuse macvlan pktgen ]
Pid: 0, comm: swapper Not tainted 2.6.31.9 #29 X8STi
RIP: 0010:[<ffffffffa005760b>]  [<ffffffffa005760b>] ixgbe_clean_rx_irq+0xdf/0x525 [ixgbe]
RSP: 0018:ffff8800280e5d90  EFLAGS: 00010287
RAX: 0000000000000042 RBX: 0000000000000000 RCX: ffffc90018ee3000
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffff88033084e480
RBP: ffff8800280e5e20 R08: 0000000000000000 R09: ffff88033084e380
R10: 0000000000000501 R11: ffff8800280e5dd0 R12: ffff88033040c2a0
R13: ffff88032e8f85c0 R14: ffff88030d074000 R15: 000000000000059c
FS:  0000000000000000(0000) GS:ffff8800280e2000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000e8 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88033255a000, task ffff88033250de80)
Stack:
  ffff8800280e5e30 ffffffffa0053cce 00000040280e5dd0 ffff8800280e5e3c
<0> ffff880332744800 ffff88033084e490 ffff88033084e480 ffff88032e8f8000
<0> 00000000280e5e01 0000006300000000 0000000000000000 0000000000000000
Call Trace:
  <IRQ>
  [<ffffffffa0053cce>] ? ixgbe_clean_tx_irq+0x125/0x3e5 [ixgbe]
  [<ffffffffa0057ab8>] ixgbe_clean_rxonly+0x67/0xe2 [ixgbe]
  [<ffffffff81351ba3>] net_rx_action+0xab/0x24d
  [<ffffffffa0053285>] ? napi_schedule+0x1b/0x1d [ixgbe]
  [<ffffffff81055206>] __do_softirq+0x114/0x220
  [<ffffffff81093bb5>] ? handle_IRQ_event+0x92/0x18a
  [<ffffffff81012d9c>] call_softirq+0x1c/0x30
  [<ffffffff81014306>] do_softirq+0x42/0x88
  [<ffffffff81055420>] irq_exit+0x3f/0x8f
  [<ffffffff81013a47>] do_IRQ+0xa0/0xb7
  [<ffffffff81012593>] ret_from_intr+0x0/0x11
  <EOI>
  [<ffffffff8124ecc8>] ? acpi_idle_enter_simple+0x10f/0x143
  [<ffffffff8124ecc1>] ? acpi_idle_enter_simple+0x108/0x143
  [<ffffffff8124e9d1>] ? acpi_idle_enter_bm+0xd3/0x2bb
  [<ffffffff81096982>] ? rcu_needs_cpu+0x32/0x43
  [<ffffffff81328772>] ? cpuidle_idle_call+0x94/0xca
  [<ffffffff81010c37>] ? cpu_idle+0x58/0xc6
  [<ffffffff813ebd34>] ? start_secondary+0x19c/0x1a0
Code: 25 e0 7f 00 00 ba 00 01 00 00 45 0f b7 7e 0c c1 f8 05 3d 00 01 00 00 0f 47 c2 eb 08 41 0f b7 46 0c 45 31 ff 48 8b 19 48
RIP  [<ffffffffa005760b>] ixgbe_clean_rx_irq+0xdf/0x525 [ixgbe]
  RSP <ffff8800280e5d90>
CR2: 00000000000000e8
---[ end trace 7c6f3b3b09f60762 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G      D    2.6.31.9 #29
Call Trace:
  <IRQ>  [<ffffffff813efa5b>] panic+0x84/0x136
  [<ffffffff813f2e26>] oops_end+0xb1/0xc1
  [<ffffffff810328fa>] no_context+0x1f1/0x200
  [<ffffffff813f1d39>] ? _spin_unlock+0x2a/0x35
  [<ffffffff810955d6>] ? handle_edge_irq+0x105/0x10e
  [<ffffffff81032aa7>] __bad_area_nosemaphore+0x19e/0x1c4
  [<ffffffff8105542d>] ? irq_exit+0x4c/0x8f
  [<ffffffff81013a47>] ? do_IRQ+0xa0/0xb7
  [<ffffffff81012593>] ? ret_from_intr+0x0/0x11
  [<ffffffff81032adb>] bad_area_nosemaphore+0xe/0x10
  [<ffffffff813f4303>] do_page_fault+0x15f/0x296
  [<ffffffff813f22f5>] page_fault+0x25/0x30
  [<ffffffffa005760b>] ? ixgbe_clean_rx_irq+0xdf/0x525 [ixgbe]
  [<ffffffffa0053cce>] ? ixgbe_clean_tx_irq+0x125/0x3e5 [ixgbe]
  [<ffffffffa0057ab8>] ixgbe_clean_rxonly+0x67/0xe2 [ixgbe]
  [<ffffffff81351ba3>] net_rx_action+0xab/0x24d
  [<ffffffffa0053285>] ? napi_schedule+0x1b/0x1d [ixgbe]
  [<ffffffff81055206>] __do_softirq+0x114/0x220
  [<ffffffff81093bb5>] ? handle_IRQ_event+0x92/0x18a
  [<ffffffff81012d9c>] call_softirq+0x1c/0x30
  [<ffffffff81014306>] do_softirq+0x42/0x88
  [<ffffffff81055420>] irq_exit+0x3f/0x8f
  [<ffffffff81013a47>] do_IRQ+0xa0/0xb7
  [<ffffffff81012593>] ret_from_intr+0x0/0x11
  <EOI>  [<ffffffff8124ecc8>] ? acpi_idle_enter_simple+0x10f/0x143
  [<ffffffff8124ecc1>] ? acpi_idle_enter_simple+0x108/0x143
  [<ffffffff8124e9d1>] ? acpi_idle_enter_bm+0xd3/0x2bb
  [<ffffffff81096982>] ? rcu_needs_cpu+0x32/0x43
  [<ffffffff81328772>] ? cpuidle_idle_call+0x94/0xca
  [<ffffffff81010c37>] ? cpu_idle+0x58/0xc6
  [<ffffffff813ebd34>] ? start_secondary+0x19c/0x1a0


I re-compiled with symbols, and ran gdb against that new ixgbe.ko
file:

(gdb) l *(ixgbe_clean_rx_irq+0xdf)
0x662f is in ixgbe_clean_rx_irq (/home/greearb/git/linux-2.6.dev.31.y/drivers/net/ixgbe/ixgbe_main.c:744).
739				len = le16_to_cpu(rx_desc->wb.upper.length);
740			}
741	
742			cleaned = true;
743			skb = rx_buffer_info->skb;
744			prefetch(skb->data - NET_IP_ALIGN);
745			rx_buffer_info->skb = NULL;
746	
747			if (rx_buffer_info->dma) {
748				pci_unmap_single(pdev, rx_buffer_info->dma,
(gdb)


For the previous crash that started this thread, the listing is this:

(gdb) l *(ixgbe_clean_rx_irq+0xe4)
0x6634 is in ixgbe_clean_rx_irq (/home/greearb/git/linux-2.6.dev.31.y/drivers/net/ixgbe/ixgbe_main.c:744).
739				len = le16_to_cpu(rx_desc->wb.upper.length);
740			}
741	
742			cleaned = true;
743			skb = rx_buffer_info->skb;
744			prefetch(skb->data - NET_IP_ALIGN);
745			rx_buffer_info->skb = NULL;
746	
747			if (rx_buffer_info->dma) {
748				pci_unmap_single(pdev, rx_buffer_info->dma,


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-01-13 21:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-04 19:57 ixgbe NULLL pointer dereference on OOM condition, 2.6.31.7 Ben Greear
2010-01-13  2:13 ` Brandeburg, Jesse
2010-01-13  5:28   ` Ben Greear
2010-01-13 21:18   ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).