netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ixgbe: panic in ixgbe_clean_rx_irq()
@ 2009-07-22 12:44 Jesper Dangaard Brouer
  2009-07-22 18:13 ` Waskiewicz Jr, Peter P
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2009-07-22 12:44 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: netdev@vger.kernel.org

Hi Peter,

Got a panic from the ixgbe driver on my 82599 based NICs, while running
two pktgen tests (1500 bytes packets) against the machine.

Kernel: 2.6.31-rc1-net-2.6-00122-ge594e96 with preemption

The panic occurs in ixgbe_clean_rx_irq+0x319/0x53b

Using objdump I can see that it seems to occur in an inlined function
ixgbe_transform_rsc_queue() in objdump "line" "2811:"
(ixgbe_clean_rx_irq starts 0x24f8 + 0x319 = 0x2811)

static inline struct sk_buff *ixgbe_transform_rsc_queue(struct sk_buff *skb)
{
        unsigned int frag_list_size = 0;

        while (skb->prev) {
    2811:       48 8b 7b 08             mov    0x8(%rbx),%rdi
    2815:       48 85 ff                test   %rdi,%rdi
    2818:       75 e9                   jne    2803 <ixgbe_clean_rx_irq+0x30b>
                frag_list_size += skb->len;
                skb->prev = NULL;
                skb = prev;
        }

        skb_shinfo(skb)->frag_list = skb->next;
    281a:       48 8b 03                mov    (%rbx),%rax
    281d:       8b 8b c8 00 00 00       mov    0xc8(%rbx),%ecx
    2823:       48 8b 93 d0 00 00 00    mov    0xd0(%rbx),%rdx
    282a:       48 89 44 0a 20          mov    %rax,0x20(%rdx,%rcx,1)
        skb->next = NULL;
    282f:       48 c7 03 00 00 00 00    movq   $0x0,(%rbx)
        skb->len += frag_list_size;
    2836:       01 73 60                add    %esi,0x60(%rbx)
        skb->data_len += frag_list_size;
    2839:       01 73 64                add    %esi,0x64(%rbx)
        skb->truesize += frag_list_size;
    283c:       01 b3 e0 00 00 00       add    %esi,0xe0(%rbx)
                }


Let me know if you want the objdump, its quite large.
-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer



root@firesoul:~# 
[ 9437.550709] general protection fault: 0000 [#1] PREEMPT SMP 
[ 9437.567818] last sysfs file: /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor
[ 9437.590784] CPU 6 
[ 9437.596933] Modules linked in: asus_atk0110 ixgbe hwmon mdio r8169
[ 9437.615601] Pid: 0, comm: swapper Not tainted 2.6.31-rc1-net-2.6-00122-ge594e96 #8 System Product Name
[ 9437.643775] RIP: 0010:[<ffffffffa001a811>]  [<ffffffffa001a811>] ixgbe_clean_rx_irq+0x319/0x53b [ixgbe]
[ 9437.672232] RSP: 0018:ffff880001925dc0  EFLAGS: 00010202
[ 9437.688275] RAX: 0000000000000000 RBX: 0100000000000000 RCX: ffff8800b2974000
[ 9437.709794] RDX: ffffc90011db7d20 RSI: 00000000000005dc RDI: 0100000000000000
[ 9437.731310] RBP: ffff880001925e50 R08: ffff8800af797c00 R09: ffff880080000000
[ 9437.752825] R10: ffff8800af797548 R11: ffff880001925da0 R12: ffffc90011db7cf8
[ 9437.774341] R13: ffff88003766c640 R14: ffff8800ba8ea580 R15: 0000000040000073
[ 9437.795856] FS:  0000000000000000(0000) GS:ffff880001922000(0000) knlGS:0000000000000000
[ 9437.820385] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 9437.837724] CR2: 00007f54f87a1315 CR3: 0000000001001000 CR4: 00000000000006a0
[ 9437.859244] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9437.880759] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 9437.902276] Process swapper (pid: 0, threadinfo ffff8800bef02000, task ffff8800beef1e00)
[ 9437.926803] Stack:
[ 9437.932942]  0000004001925de0 ffff880001925e6c ffff8800bb2cf500 ffff8800be572000
[ 9437.954743] <0> ffff8800b2976530 ffff8800b2976540 00000254ad917000 0000002a00000000
[ 9437.978016] <0> 000005dc00000002 000005b200000001 ffff8800bb2cf510 0000000000000254
[ 9438.001963] Call Trace:
[ 9438.009411]  <IRQ> 
[ 9438.015833]  [<ffffffffa001aa81>] ixgbe_clean_rxonly+0x4e/0xd2 [ixgbe]
[ 9438.035534]  [<ffffffff8138a7b5>] net_rx_action+0xb2/0x234
[ 9438.052099]  [<ffffffff8103fb35>] __do_softirq+0x10c/0x211
[ 9438.068668]  [<ffffffff8100cc2c>] call_softirq+0x1c/0x28
[ 9438.084717]  [<ffffffff8100de84>] do_softirq+0x34/0x72
[ 9438.100243]  [<ffffffff8103f83f>] irq_exit+0x36/0x85
[ 9438.115250]  [<ffffffff8100d75f>] do_IRQ+0xa6/0xbd
[ 9438.129738]  [<ffffffff8100c493>] ret_from_intr+0x0/0xa
[ 9438.145524]  <EOI> 
[ 9438.151946]  [<ffffffff81012355>] ? mwait_idle+0x89/0x9f
[ 9438.167991]  [<ffffffff81012348>] ? mwait_idle+0x7c/0x9f
[ 9438.187996]  [<ffffffff8142c948>] ? atomic_notifier_call_chain+0xf/0x11
[ 9438.207953]  [<ffffffff8100aedb>] ? cpu_idle+0x4f/0xb3
[ 9438.223479]  [<ffffffff81424478>] ? start_secondary+0x17f/0x184
[ 9438.241350] Code: 48 eb 09 48 6b 55 c8 28 49 03 55 08 41 f6 c7 02 74 60 31 f6 48 83 7b 08 00 75 10 eb 3f 03 73 60 48 c7 43 08 00
 00 00 00 48 89 fb <48> 8b 7b 08 48 85 ff 75 e9 48 8b 03 8b 8b c8 00 00 00 48 8b 93 
[ 9438.298386] RIP  [<ffffffffa001a811>] ixgbe_clean_rx_irq+0x319/0x53b [ixgbe]
[ 9438.319652]  RSP <ffff880001925dc0>
[ 9438.330593] ---[ end trace f60de63a65a43085 ]---
[ 9438.344593] Kernel panic - not syncing: Fatal exception in interrupt
[ 9438.363810] Pid: 0, comm: swapper Tainted: G      D    2.6.31-rc1-net-2.6-00122-ge594e96 #8
[ 9438.389159] Call Trace:
[ 9438.396655]  <IRQ>  [<ffffffff81427c6b>] panic+0xaa/0x155
[ 9438.413056]  [<ffffffff8103f88c>] ? irq_exit+0x83/0x85
[ 9438.428627]  [<ffffffff8100c493>] ? ret_from_intr+0x0/0xa
[ 9438.444975]  [<ffffffff8142b039>] ? oops_end+0x6b/0xba
[ 9438.460543]  [<ffffffff8142b078>] oops_end+0xaa/0xba
[ 9438.475589]  [<ffffffff8100f0d1>] die+0x55/0x5e
[ 9438.489332]  [<ffffffff8142ac8e>] do_general_protection+0x123/0x12b
[ 9438.508288]  [<ffffffff813b39ad>] ? ip_rcv+0x2a7/0x2de
[ 9438.523853]  [<ffffffff8142a5df>] general_protection+0x1f/0x30
[ 9438.541509]  [<ffffffffa001a811>] ? ixgbe_clean_rx_irq+0x319/0x53b [ixgbe]
[ 9438.562298]  [<ffffffffa001aa81>] ixgbe_clean_rxonly+0x4e/0xd2 [ixgbe]
[ 9438.582036]  [<ffffffff8138a7b5>] net_rx_action+0xb2/0x234
[ 9438.598641]  [<ffffffff8103fb35>] __do_softirq+0x10c/0x211
[ 9438.615250]  [<ffffffff8100cc2c>] call_softirq+0x1c/0x28
[ 9438.631337]  [<ffffffff8100de84>] do_softirq+0x34/0x72
[ 9438.646904]  [<ffffffff8103f83f>] irq_exit+0x36/0x85
[ 9438.661949]  [<ffffffff8100d75f>] do_IRQ+0xa6/0xbd
[ 9438.676476]  [<ffffffff8100c493>] ret_from_intr+0x0/0xa
[ 9438.692303]  <EOI>  [<ffffffff81012355>] ? mwait_idle+0x89/0x9f
[ 9438.710264]  [<ffffffff81012348>] ? mwait_idle+0x7c/0x9f
[ 9438.726354]  [<ffffffff8142c948>] ? atomic_notifier_call_chain+0xf/0x11
[ 9438.746348]  [<ffffffff8100aedb>] ? cpu_idle+0x4f/0xb3
[ 9438.761914]  [<ffffffff81424478>] ? start_secondary+0x17f/0x184



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ixgbe: panic in ixgbe_clean_rx_irq()
  2009-07-22 12:44 ixgbe: panic in ixgbe_clean_rx_irq() Jesper Dangaard Brouer
@ 2009-07-22 18:13 ` Waskiewicz Jr, Peter P
  2009-07-23  8:46   ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Waskiewicz Jr, Peter P @ 2009-07-22 18:13 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Waskiewicz Jr, Peter P, netdev@vger.kernel.org

On Wed, 22 Jul 2009, Jesper Dangaard Brouer wrote:

> Hi Peter,
> 
> Got a panic from the ixgbe driver on my 82599 based NICs, while running
> two pktgen tests (1500 bytes packets) against the machine.
> 
> Kernel: 2.6.31-rc1-net-2.6-00122-ge594e96 with preemption
> 
> The panic occurs in ixgbe_clean_rx_irq+0x319/0x53b
> 
> Using objdump I can see that it seems to occur in an inlined function
> ixgbe_transform_rsc_queue() in objdump "line" "2811:"
> (ixgbe_clean_rx_irq starts 0x24f8 + 0x319 = 0x2811)
> 
> static inline struct sk_buff *ixgbe_transform_rsc_queue(struct sk_buff *skb)
> {
>         unsigned int frag_list_size = 0;
> 
>         while (skb->prev) {
>     2811:       48 8b 7b 08             mov    0x8(%rbx),%rdi
>     2815:       48 85 ff                test   %rdi,%rdi
>     2818:       75 e9                   jne    2803 <ixgbe_clean_rx_irq+0x30b>
>                 frag_list_size += skb->len;
>                 skb->prev = NULL;
>                 skb = prev;
>         }
> 
>         skb_shinfo(skb)->frag_list = skb->next;
>     281a:       48 8b 03                mov    (%rbx),%rax
>     281d:       8b 8b c8 00 00 00       mov    0xc8(%rbx),%ecx
>     2823:       48 8b 93 d0 00 00 00    mov    0xd0(%rbx),%rdx
>     282a:       48 89 44 0a 20          mov    %rax,0x20(%rdx,%rcx,1)
>         skb->next = NULL;
>     282f:       48 c7 03 00 00 00 00    movq   $0x0,(%rbx)
>         skb->len += frag_list_size;
>     2836:       01 73 60                add    %esi,0x60(%rbx)
>         skb->data_len += frag_list_size;
>     2839:       01 73 64                add    %esi,0x64(%rbx)
>         skb->truesize += frag_list_size;
>     283c:       01 b3 e0 00 00 00       add    %esi,0xe0(%rbx)
>                 }
> 
> 
> Let me know if you want the objdump, its quite large.

Thanks for the report Jesper.  Hold onto the objdump for now.  If I need 
it, I'll let you know (and give you a place to upload it to).

For the time being, I have one of my validation guys trying to get a repro 
in our lab right now.  So please consider this being actively worked on 
right now.  Hopefully I'll have something for you soon.

Cheers,
-PJ Waskiewicz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ixgbe: panic in ixgbe_clean_rx_irq()
  2009-07-22 18:13 ` Waskiewicz Jr, Peter P
@ 2009-07-23  8:46   ` Jesper Dangaard Brouer
  2009-07-23 15:21     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2009-07-23  8:46 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: netdev@vger.kernel.org

On Wed, 2009-07-22 at 11:13 -0700, Waskiewicz Jr, Peter P wrote:
> On Wed, 22 Jul 2009, Jesper Dangaard Brouer wrote:
> 
> > Hi Peter,
> > 
> > Got a panic from the ixgbe driver on my 82599 based NICs, while running
> > two pktgen tests (1500 bytes packets) against the machine.
> > 
> > Kernel: 2.6.31-rc1-net-2.6-00122-ge594e96 with preemption

Got a new panic.  The strange thing its not happening in the same
place... and the anoying thing is that its hard to reproduce, as I have
to run pktgen a very long time before it dies.

I'm starting to suspect it could be related to CPU freq scaling, as the
last sysfs file is /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor
and the call stack originates from mwait_idle().


[27925.077996] general protection fault: 0000 [#1] PREEMPT SMP 
[27925.095080] last sysfs file: /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor
[27925.118042] CPU 7 
[27925.124196] Modules linked in: asus_atk0110 ixgbe hwmon mdio r8169
[27925.142863] Pid: 0, comm: swapper Not tainted 2.6.31-rc1-net-2.6-00122-ge594e96 #8 System Product Name
[27925.171033] RIP: 0010:[<ffffffff81382b29>]  [<ffffffff81382b29>] skb_release_head_state+0x69/0xba
[27925.197904] RSP: 0018:ffff880001942d70  EFLAGS: 00010286
[27925.213938] RAX: 0000000000000000 RBX: ffff8800b0022100 RCX: 0000000000000400
[27925.235443] RDX: 00000000000003d6 RSI: 00000000b9c03000 RDI: d100000000000000
[27925.256958] RBP: ffff880001942d80 R08: 00000000000002a3 R09: ffffc90011e6a428
[27925.278474] R10: 0000000101a5abfb R11: ffffffff813828c5 R12: ffff8800bac7c580
[27925.299990] R13: 0000000000002a40 R14: 00000000000002a5 R15: 00000000000002a4
[27925.321504] FS:  0000000000000000(0000) GS:ffff88000193f000(0000) knlGS:0000000000000000
[27925.346030] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[27925.363373] CR2: 00007fecde5734a8 CR3: 00000000a5015000 CR4: 00000000000006a0
[27925.384878] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[27925.406380] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[27925.427884] Process swapper (pid: 0, threadinfo ffff8800bef0c000, task ffff8800beef3480)
[27925.452409] Stack:
[27925.458553]  00000000000002a3 ffff8800b0022100 ffff880001942da0 ffffffff81382854
[27925.480328] <0> ffffc90011e6a428 ffffc900119ee480 ffff880001942db0 ffffffff813828f5
[27925.503588] <0> ffff880001942dc0 ffffffff8138b441 ffff880001942de0 ffffffffa001ab3e
[27925.527521] Call Trace:
[27925.534971]  <IRQ> 
[27925.541385]  [<ffffffff81382854>] __kfree_skb+0x11/0x87
[27925.557169]  [<ffffffff813828f5>] consume_skb+0x2b/0x2d
[27925.572943]  [<ffffffff8138b441>] dev_kfree_skb_any+0x2e/0x30
[27925.590287]  [<ffffffffa001ab3e>] ixgbe_unmap_and_free_tx_resource+0x39/0x4c [ixgbe]
[27925.613772]  [<ffffffffa001acb5>] ixgbe_clean_tx_irq+0xcf/0x366 [ixgbe]
[27925.633721]  [<ffffffffa001af88>] ixgbe_clean_txonly+0x3c/0xbd [ixgbe]
[27925.653410]  [<ffffffff8138a7b5>] net_rx_action+0xb2/0x234
[27925.669966]  [<ffffffff8103fb35>] __do_softirq+0x10c/0x211
[27925.686521]  [<ffffffff8100cc2c>] call_softirq+0x1c/0x28
[27925.702556]  [<ffffffff8100de84>] do_softirq+0x34/0x72
[27925.718070]  [<ffffffff8103f83f>] irq_exit+0x36/0x85
[27925.733065]  [<ffffffff8100d75f>] do_IRQ+0xa6/0xbd
[27925.747540]  [<ffffffff8100c493>] ret_from_intr+0x0/0xa
[27925.767191]  <EOI> 
[27925.773603]  [<ffffffff81012355>] ? mwait_idle+0x89/0x9f
[27925.789646]  [<ffffffff81012348>] ? mwait_idle+0x7c/0x9f
[27925.805682]  [<ffffffff8142c948>] ? atomic_notifier_call_chain+0xf/0x11
[27925.825624]  [<ffffffff8100aedb>] ? cpu_idle+0x4f/0xb3
[27925.841137]  [<ffffffff81424478>] ? start_secondary+0x17f/0x184
[27925.858994] Code: 00 ff 03 74 11 be 8e 01 00 00 48 c7 c7 fc d3 5f 81 e8 7f 79 cb ff 48 89 df ff 93 80 00 00 00 48 8b bb 88 00 00
 00 48 85 ff 74 0f <f0> ff 0f 0f 94 c0 84 c0 74 05 e8 68 1f 02 00 48 8b bb 90 00 00 
[27925.915956] RIP  [<ffffffff81382b29>] skb_release_head_state+0x69/0xba
[27925.935656]  RSP <ffff880001942d70>
[27925.946574] ---[ end trace acd86b5373a70766 ]---
[27925.960578] Kernel panic - not syncing: Fatal exception in interrupt
[27925.979798] Pid: 0, comm: swapper Tainted: G      D    2.6.31-rc1-net-2.6-00122-ge594e96 #8
[27926.005157] Call Trace:
[27926.012654]  <IRQ>  [<ffffffff81427c6b>] panic+0xaa/0x155
[27926.029070]  [<ffffffff8103f88c>] ? irq_exit+0x83/0x85
[27926.044644]  [<ffffffff8100c493>] ? ret_from_intr+0x0/0xa
[27926.061002]  [<ffffffff8142b039>] ? oops_end+0x6b/0xba
[27926.076571]  [<ffffffff8142b078>] oops_end+0xaa/0xba
[27926.091628]  [<ffffffff8100f0d1>] die+0x55/0x5e
[27926.105374]  [<ffffffff8142ac8e>] do_general_protection+0x123/0x12b
[27926.124324]  [<ffffffff8142a1af>] ? _spin_unlock+0x2a/0x35
[27926.140936]  [<ffffffff8142a5df>] general_protection+0x1f/0x30
[27926.158595]  [<ffffffff813828c5>] ? __kfree_skb+0x82/0x87
[27926.174946]  [<ffffffff81382b29>] ? skb_release_head_state+0x69/0xba
[27926.194169]  [<ffffffff81382854>] __kfree_skb+0x11/0x87
[27926.210008]  [<ffffffff813828f5>] consume_skb+0x2b/0x2d
[27926.225834]  [<ffffffff8138b441>] dev_kfree_skb_any+0x2e/0x30
[27926.243231]  [<ffffffffa001ab3e>] ixgbe_unmap_and_free_tx_resource+0x39/0x4c [ixgbe]
[27926.266772]  [<ffffffffa001acb5>] ixgbe_clean_tx_irq+0xcf/0x366 [ixgbe]
[27926.286785]  [<ffffffffa001af88>] ixgbe_clean_txonly+0x3c/0xbd [ixgbe]
[27926.306524]  [<ffffffff8138a7b5>] net_rx_action+0xb2/0x234
[27926.323147]  [<ffffffff8103fb35>] __do_softirq+0x10c/0x211
[27926.339765]  [<ffffffff8100cc2c>] call_softirq+0x1c/0x28
[27926.355853]  [<ffffffff8100de84>] do_softirq+0x34/0x72
[27926.371417]  [<ffffffff8103f83f>] irq_exit+0x36/0x85
[27926.386466]  [<ffffffff8100d75f>] do_IRQ+0xa6/0xbd
[27926.400990]  [<ffffffff8100c493>] ret_from_intr+0x0/0xa
[27926.416819]  <EOI>  [<ffffffff81012355>] ? mwait_idle+0x89/0x9f
[27926.434794]  [<ffffffff81012348>] ? mwait_idle+0x7c/0x9f
[27926.450881]  [<ffffffff8142c948>] ? atomic_notifier_call_chain+0xf/0x11
[27926.470877]  [<ffffffff8100aedb>] ? cpu_idle+0x4f/0xb3
[27926.486454]  [<ffffffff81424478>] ? start_secondary+0x17f/0x184

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ixgbe: panic in ixgbe_clean_rx_irq()
  2009-07-23  8:46   ` Jesper Dangaard Brouer
@ 2009-07-23 15:21     ` Jesper Dangaard Brouer
  2009-07-24  8:41       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2009-07-23 15:21 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: netdev@vger.kernel.org

On Thu, 2009-07-23 at 10:46 +0200, Jesper Dangaard Brouer wrote:
> > > Hi Peter,
> > > 
> > > Got a panic from the ixgbe driver on my 82599 based NICs, while
> running
> > > two pktgen tests (1500 bytes packets) against the machine.
> > > 
> > > Kernel: 2.6.31-rc1-net-2.6-00122-ge594e96 with preemption
> 
> Got a new panic.  The strange thing its not happening in the same
> place... and the anoying thing is that its hard to reproduce, as I
> have to run pktgen a very long time before it dies.

Got yet another panic... now its in skb_put() in the line
SKB_LINEAR_ASSERT(skb).

I'm going to run a test with a no-preempt kernel over night...

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer

[  531.614873] ------------[ cut here ]------------
[  531.628869] kernel BUG at net/core/skbuff.c:1014!
[  531.643116] invalid opcode: 0000 [#1] PREEMPT SMP 
[  531.657774] last sysfs file: /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor
[  531.680753] CPU 2 
[  531.686970] Modules linked in: asus_atk0110 ixgbe hwmon mdio r8169
[  531.705922] Pid: 0, comm: swapper Not tainted 2.6.31-rc1-net-2.6-00122-ge594e96 #8 System Product Name
[  531.734109] RIP: 0010:[<ffffffff81382538>]  [<ffffffff81382538>] skb_put+0x1d/0x89
[  531.757137] RSP: 0000:ffff8800018b1d90  EFLAGS: 00010286
[  531.773208] RAX: 0000000000000000 RBX: ffff8800a11b6200 RCX: 000000000000002a
[  531.794752] RDX: 0000000000000022 RSI: 000000000000002a RDI: ffff8800a11b6200
[  531.816292] RBP: ffff8800018b1db0 R08: ffff8800a107f400 R09: ffff880080000000
[  531.837833] R10: ffff8800a107d948 R11: 0100880000000000 R12: ffffc90011a90710
[  531.859375] R13: ffff8800376430c0 R14: ffff8800bb442580 R15: 0000000040000073
[  531.880916] FS:  0000000000000000(0000) GS:ffff8800018ae000(0000) knlGS:0000000000000000
[  531.905458] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  531.922825] CR2: 0000000000607000 CR3: 00000000a9ca7000 CR4: 00000000000006a0
[  531.944370] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  531.965909] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  531.987452] Process swapper (pid: 0, threadinfo ffff8800beec8000, task ffff8800bee94380)
[  532.011995] Stack:
[  532.018162]  ffff8800018b1db0 ffffffff8138a699 ffff8800a11b6600 ffffc90011a906e8
[  532.040107] <0> ffff8800018b1e50 ffffffffa001a677 00000040018b1de0 ffff8800018b1e6c
[  532.063587] <0> ffff8800bbd47b00 ffff8800be5aa000 ffff8800bd790fa0 ffff8800bd790fa0
[  532.087770] Call Trace:
[  532.095243]  <IRQ> 
[  532.101724]  [<ffffffff8138a699>] ? napi_gro_receive+0x2a/0x2f
[  532.119359]  [<ffffffffa001a677>] ixgbe_clean_rx_irq+0x17f/0x53b [ixgbe]
[  532.139599]  [<ffffffffa001aa81>] ixgbe_clean_rxonly+0x4e/0xd2 [ixgbe]
[  532.159314]  [<ffffffff8138a7b5>] net_rx_action+0xb2/0x234
[  532.175908]  [<ffffffff8103fb35>] __do_softirq+0x10c/0x211
[  532.192503]  [<ffffffff8100cc2c>] call_softirq+0x1c/0x28
[  532.208577]  [<ffffffff8100de84>] do_softirq+0x34/0x72
[  532.224131]  [<ffffffff8103f83f>] irq_exit+0x36/0x85
[  532.239164]  [<ffffffff8100d75f>] do_IRQ+0xa6/0xbd
[  532.253677]  [<ffffffff8100c493>] ret_from_intr+0x0/0xa
[  532.273123]  <EOI> 
[  532.279611]  [<ffffffff81012355>] ? mwait_idle+0x89/0x9f
[  532.295681]  [<ffffffff81012348>] ? mwait_idle+0x7c/0x9f
[  532.311755]  [<ffffffff8142c948>] ? atomic_notifier_call_chain+0xf/0x11
[  532.331737]  [<ffffffff8100aedb>] ? cpu_idle+0x4f/0xb3
[  532.347288]  [<ffffffff81424478>] ? start_secondary+0x17f/0x184
[  532.365185] Code: e8 04 58 0a 00 0f 0b eb fe 48 89 c8 c9 c3 55 89 f1 48 89 e5 48 83 ec 20 4c 8b 87 d0 00 00 00 8b 97 c4 00 00 00 83 7f
 64 00 74 04 <0f> 0b eb fe 8d 04 11 01 77 60 89 87 c4 00 00 00 3b 87 c8 00 00 
[  532.425076] RIP  [<ffffffff81382538>] skb_put+0x1d/0x89
[  532.440935]  RSP <ffff8800018b1d90>
[  532.451539] ---[ end trace 2bb373a2a0219cb4 ]---
[  532.465527] Kernel panic - not syncing: Fatal exception in interrupt
[  532.484728] Pid: 0, comm: swapper Tainted: G      D    2.6.31-rc1-net-2.6-00122-ge594e96 #8
[  532.510052] Call Trace:
[  532.517535]  <IRQ>  [<ffffffff81427c6b>] panic+0xaa/0x155
[  532.533924]  [<ffffffff8103f88c>] ? irq_exit+0x83/0x85
[  532.549483]  [<ffffffff8100c493>] ? ret_from_intr+0x0/0xa
[  532.565815]  [<ffffffff8142b039>] ? oops_end+0x6b/0xba
[  532.581369]  [<ffffffff8142b078>] oops_end+0xaa/0xba
[  532.596403]  [<ffffffff8100f0d1>] die+0x55/0x5e
[  532.610136]  [<ffffffff8142aad9>] do_trap+0x110/0x11f
[  532.625430]  [<ffffffff8100d44f>] do_invalid_op+0x91/0x9a
[  532.641764]  [<ffffffff81382538>] ? skb_put+0x1d/0x89
[  532.657058]  [<ffffffff813b4d2a>] ? ip_forward+0x288/0x2e2
[  532.673652]  [<ffffffff813b36ec>] ? ip_rcv_finish+0x37c/0x396
[  532.691030]  [<ffffffff8100c8c5>] invalid_op+0x15/0x20
[  532.706582]  [<ffffffff81382538>] ? skb_put+0x1d/0x89
[  532.721876]  [<ffffffff8138a699>] ? napi_gro_receive+0x2a/0x2f
[  532.739518]  [<ffffffffa001a677>] ixgbe_clean_rx_irq+0x17f/0x53b [ixgbe]
[  532.759759]  [<ffffffffa001aa81>] ixgbe_clean_rxonly+0x4e/0xd2 [ixgbe]
[  532.779474]  [<ffffffff8138a7b5>] net_rx_action+0xb2/0x234
[  532.796068]  [<ffffffff8103fb35>] __do_softirq+0x10c/0x211
[  532.812664]  [<ffffffff8100cc2c>] call_softirq+0x1c/0x28
[  532.828738]  [<ffffffff8100de84>] do_softirq+0x34/0x72
[  532.844291]  [<ffffffff8103f83f>] irq_exit+0x36/0x85
[  532.859324]  [<ffffffff8100d75f>] do_IRQ+0xa6/0xbd
[  532.873838]  [<ffffffff8100c493>] ret_from_intr+0x0/0xa
[  532.889651]  <EOI>  [<ffffffff81012355>] ? mwait_idle+0x89/0x9f
[  532.907599]  [<ffffffff81012348>] ? mwait_idle+0x7c/0x9f
[  532.923676]  [<ffffffff8142c948>] ? atomic_notifier_call_chain+0xf/0x11
[  532.943658]  [<ffffffff8100aedb>] ? cpu_idle+0x4f/0xb3
[  532.959210]  [<ffffffff81424478>] ? start_secondary+0x17f/0x184


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ixgbe: panic in ixgbe_clean_rx_irq()
  2009-07-23 15:21     ` Jesper Dangaard Brouer
@ 2009-07-24  8:41       ` Jesper Dangaard Brouer
  2009-07-24 18:47         ` Waskiewicz Jr, Peter P
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2009-07-24  8:41 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: netdev@vger.kernel.org

On Thu, 2009-07-23 at 17:21 +0200, Jesper Dangaard Brouer wrote:
> I'm going to run a test with a no-preempt kernel over night...

The no-preempt kernel has been stable for 17 hours, while running a four
times 10GbE pktgen load test (using 1024 bytes packets).

Kernel git version (git describe):
 v2.6.31-rc1-932-g8e321c4

Git ("git log e594e96..8e321c4 drivers/net/ixgbe") reports not changes
to the ixgbe driver between these two kernel versions.

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ixgbe: panic in ixgbe_clean_rx_irq()
  2009-07-24  8:41       ` Jesper Dangaard Brouer
@ 2009-07-24 18:47         ` Waskiewicz Jr, Peter P
  2009-07-27 21:50           ` Todd Merritt
  0 siblings, 1 reply; 9+ messages in thread
From: Waskiewicz Jr, Peter P @ 2009-07-24 18:47 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Waskiewicz Jr, Peter P, netdev@vger.kernel.org

On Fri, 24 Jul 2009, Jesper Dangaard Brouer wrote:

> On Thu, 2009-07-23 at 17:21 +0200, Jesper Dangaard Brouer wrote:
> > I'm going to run a test with a no-preempt kernel over night...
> 
> The no-preempt kernel has been stable for 17 hours, while running a four
> times 10GbE pktgen load test (using 1024 bytes packets).
> 
> Kernel git version (git describe):
>  v2.6.31-rc1-932-g8e321c4
> 
> Git ("git log e594e96..8e321c4 drivers/net/ixgbe") reports not changes
> to the ixgbe driver between these two kernel versions.

Thanks for the additional info Jesper.  This is definately a good data 
point.  I'm not sure where to go with the preemption debugging at this 
point with our driver under heavy load.  We're still trying to repro here.  
I'll keep digging around though in the meantime.

This is the second preemption-induced bug from ixgbe we've had reported in 
a few days.  Eek!

Cheers,
-PJ Waskiewicz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ixgbe: panic in ixgbe_clean_rx_irq()
  2009-07-24 18:47         ` Waskiewicz Jr, Peter P
@ 2009-07-27 21:50           ` Todd Merritt
  2009-07-27 22:31             ` Waskiewicz Jr, Peter P
  0 siblings, 1 reply; 9+ messages in thread
From: Todd Merritt @ 2009-07-27 21:50 UTC (permalink / raw)
  To: netdev

Waskiewicz Jr, Peter P <peter.p.waskiewicz.jr <at> intel.com> writes:

> 
> On Fri, 24 Jul 2009, Jesper Dangaard Brouer wrote:
> 
> > On Thu, 2009-07-23 at 17:21 +0200, Jesper Dangaard Brouer wrote:
> > > I'm going to run a test with a no-preempt kernel over night...
> > 
> > The no-preempt kernel has been stable for 17 hours, while running a four
> > times 10GbE pktgen load test (using 1024 bytes packets).
> > 
> > Kernel git version (git describe):
> >  v2.6.31-rc1-932-g8e321c4
> > 
> > Git ("git log e594e96..8e321c4 drivers/net/ixgbe") reports not changes
> > to the ixgbe driver between these two kernel versions.
> 
> Thanks for the additional info Jesper.  This is definately a good data 
> point.  I'm not sure where to go with the preemption debugging at this 
> point with our driver under heavy load.  We're still trying to repro here.  
> I'll keep digging around though in the meantime.
> 
> This is the second preemption-induced bug from ixgbe we've had reported in 
> a few days.  Eek!
> 
We're getting a panic in the same function with the stable version of the ixgbe
driver 2.0.38.2-1 on redhat 5 kernel-2.6.18-128.1.10.el5.  Is there a dev
release of the driver that might have a fix in it ?

Thanks,
Todd




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ixgbe: panic in ixgbe_clean_rx_irq()
  2009-07-27 21:50           ` Todd Merritt
@ 2009-07-27 22:31             ` Waskiewicz Jr, Peter P
  2009-07-28 13:37               ` Todd Merritt
  0 siblings, 1 reply; 9+ messages in thread
From: Waskiewicz Jr, Peter P @ 2009-07-27 22:31 UTC (permalink / raw)
  To: Todd Merritt; +Cc: netdev@vger.kernel.org

On Mon, 27 Jul 2009, Todd Merritt wrote:

> We're getting a panic in the same function with the stable version of the ixgbe
> driver 2.0.38.2-1 on redhat 5 kernel-2.6.18-128.1.10.el5.  Is there a dev
> release of the driver that might have a fix in it ?

Are you also running with preemption enabled?  We're having a very tough 
time reproducing this bug in our lab, so no, we don't have any fix for it 
yet.

-PJ

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ixgbe: panic in ixgbe_clean_rx_irq()
  2009-07-27 22:31             ` Waskiewicz Jr, Peter P
@ 2009-07-28 13:37               ` Todd Merritt
  0 siblings, 0 replies; 9+ messages in thread
From: Todd Merritt @ 2009-07-28 13:37 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: netdev@vger.kernel.org

No, we're running the stock kernel.  We can reliably reproduce it here.  
Please let me know if there's anything we can do to help you with 
providing a fix.

Thanks,
Todd

Waskiewicz Jr, Peter P wrote:
> On Mon, 27 Jul 2009, Todd Merritt wrote:
>
>   
>> We're getting a panic in the same function with the stable version of the ixgbe
>> driver 2.0.38.2-1 on redhat 5 kernel-2.6.18-128.1.10.el5.  Is there a dev
>> release of the driver that might have a fix in it ?
>>     
>
> Are you also running with preemption enabled?  We're having a very tough 
> time reproducing this bug in our lab, so no, we don't have any fix for it 
> yet.
>
> -PJ
>
>   


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-07-28 14:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-22 12:44 ixgbe: panic in ixgbe_clean_rx_irq() Jesper Dangaard Brouer
2009-07-22 18:13 ` Waskiewicz Jr, Peter P
2009-07-23  8:46   ` Jesper Dangaard Brouer
2009-07-23 15:21     ` Jesper Dangaard Brouer
2009-07-24  8:41       ` Jesper Dangaard Brouer
2009-07-24 18:47         ` Waskiewicz Jr, Peter P
2009-07-27 21:50           ` Todd Merritt
2009-07-27 22:31             ` Waskiewicz Jr, Peter P
2009-07-28 13:37               ` Todd Merritt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).