* crash in tcp_fragment
@ 2012-02-27 2:57 Tim Hartrick
2012-02-27 23:18 ` Ilpo Järvinen
0 siblings, 1 reply; 4+ messages in thread
From: Tim Hartrick @ 2012-02-27 2:57 UTC (permalink / raw)
To: netdev
Netdev,
We have been seeing the crash cited below on a number of our systems
running kernel version 2.6.34, 2.6.36 and 2.6.38 using a number of
different GigE controllers. I note that commit
2fceec13375e5d98ef033c6b0ee03943fc460950 introduced a band-aid for the
problem by replacing the BUG_ON() to a WARN(). I have two questions
related to this.
1) Is there a fix for the root cause? Can I get a pointer to the commit
that claims to address the root cause?
2) Will disabling GSO/TSO make the problem go away? That is, is
something related to GSO/TSO at the root of the problem?
Thanks
Tim Hartrick
PID: 0 TASK: ffff880bff2e5b80 CPU: 2 COMMAND: "kworker/0:1"
#0 [ffff880c2fc23580] machine_kexec at ffffffff81032b49
#1 [ffff880c2fc235f0] crash_kexec at ffffffff810ac042
#2 [ffff880c2fc236c0] oops_end at ffffffff815d6338
#3 [ffff880c2fc236f0] die at ffffffff8100fd0b
#4 [ffff880c2fc23720] do_trap at ffffffff815d5c14
#5 [ffff880c2fc23780] do_invalid_op at ffffffff8100d9a5
#6 [ffff880c2fc23820] invalid_op at ffffffff8100ccdb
[exception RIP: tcp_fragment+818]
RIP: ffffffff8152fac2 RSP: ffff880c2fc238d0 RFLAGS: 00010287
RAX: 0000000000000007 RBX: ffff880b35f10000 RCX: 00000000000005b0
RDX: 00000000000027d0 RSI: ffff880b35f10000 RDI: ffff88084946ce00
RBP: ffff880c2fc23920 R8: 00000000000027d0 R9: 00000000a5041694
R10: dead000000200200 R11: 0000000000000000 R12: 0000000000002230
R13: ffff88084946ce00 R14: 0000000000000016 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff880c2fc23928] tcp_mark_head_lost at ffffffff815254f6
#8 [ffff880c2fc23978] tcp_update_scoreboard at ffffffff8152560b
#9 [ffff880c2fc23998] tcp_fastretrans_alert at ffffffff8152a5ca
#10 [ffff880c2fc239e8] tcp_ack at ffffffff8152c464
#11 [ffff880c2fc23a58] tcp_rcv_established at ffffffff8152d1b0
#12 [ffff880c2fc23aa8] tcp_v4_do_rcv at ffffffff815353b5
#13 [ffff880c2fc23ad8] tcp_v4_rcv at ffffffff81536ba9
#14 [ffff880c2fc23b58] ip_local_deliver_finish at ffffffff8151372d
#15 [ffff880c2fc23b88] ip_local_deliver at ffffffff81513960
#16 [ffff880c2fc23bb8] ip_rcv_finish at ffffffff81512f31
#17 [ffff880c2fc23be8] ip_rcv at ffffffff8151357d
#18 [ffff880c2fc23c28] __netif_receive_skb at ffffffff814dd24a
#19 [ffff880c2fc23ca8] netif_receive_skb at ffffffff814e2910
#20 [ffff880c2fc23ce8] napi_skb_finish at ffffffff814e2a70
#21 [ffff880c2fc23d08] napi_gro_receive at ffffffff814e2f05
#22 [ffff880c2fc23d28] bnx2_rx_int at ffffffffa013866a
#23 [ffff880c2fc23df8] bnx2_poll_work at ffffffffa0138b10
#24 [ffff880c2fc23e28] bnx2_poll_msix at ffffffffa0138b7d
#25 [ffff880c2fc23e68] net_rx_action at ffffffff814e30e8
#26 [ffff880c2fc23ec8] __do_softirq at ffffffff8106beeb
#27 [ffff880c2fc23f38] call_softirq at ffffffff8100cf5c
#28 [ffff880c2fc23f50] do_softirq at ffffffff8100e9c5
#29 [ffff880c2fc23f70] irq_exit at ffffffff8106bdb5
#30 [ffff880c2fc23f80] do_IRQ at ffffffff815dcf66
--- <IRQ stack> ---
#31 [ffff880bff2efda0] ret_from_intr at ffffffff815d5393
RIP: ffffffffffffff73 RSP: 0000000000000202 RFLAGS: 00000010
RAX: 00000000fffffffd RBX: ffff880bff2efea8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000489 RDI: 0000000000000000
RBP: ffffffff815d538e R8: 0000000000000320 R9: 0000000000000001
R10: 0000000000000000 R11: ffff88019462adc0 R12: ffff880bff2efe18
R13: ffffffff81051f50 R14: ffff880bff2efdc8 R15: ffffffff812de426
ORIG_RAX: 000000000011bae9 CS: ffffffff81332a71 SS: ffff880bff2efe58
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: crash in tcp_fragment
2012-02-27 2:57 crash in tcp_fragment Tim Hartrick
@ 2012-02-27 23:18 ` Ilpo Järvinen
2012-02-27 23:29 ` Tim Hartrick
0 siblings, 1 reply; 4+ messages in thread
From: Ilpo Järvinen @ 2012-02-27 23:18 UTC (permalink / raw)
To: Tim Hartrick; +Cc: Netdev
On Sun, 26 Feb 2012, Tim Hartrick wrote:
> We have been seeing the crash cited below on a number of our systems
> running kernel version 2.6.34, 2.6.36 and 2.6.38 using a number of
> different GigE controllers. I note that commit
> 2fceec13375e5d98ef033c6b0ee03943fc460950 introduced a band-aid for the
> problem by replacing the BUG_ON() to a WARN(). I have two questions
> related to this.
>
> 1) Is there a fix for the root cause? Can I get a pointer to the commit
> that claims to address the root cause?
There have been multiple fixes recently on the counters which this code
depends on, however, nobody has afaict fully analyzed what else those
fixed.
> 2) Will disabling GSO/TSO make the problem go away? That is, is
> something related to GSO/TSO at the root of the problem?
Very likely.
--
i.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: crash in tcp_fragment
2012-02-27 23:18 ` Ilpo Järvinen
@ 2012-02-27 23:29 ` Tim Hartrick
2012-03-03 7:38 ` Neal Cardwell
0 siblings, 1 reply; 4+ messages in thread
From: Tim Hartrick @ 2012-02-27 23:29 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: Netdev
Ilpo,
Thanks for the answers. This gives me some clarity on how to address
the problem.
Tim Hartrick
On Tue, 2012-02-28 at 01:18 +0200, Ilpo Järvinen wrote:
> On Sun, 26 Feb 2012, Tim Hartrick wrote:
>
> > We have been seeing the crash cited below on a number of our systems
> > running kernel version 2.6.34, 2.6.36 and 2.6.38 using a number of
> > different GigE controllers. I note that commit
> > 2fceec13375e5d98ef033c6b0ee03943fc460950 introduced a band-aid for the
> > problem by replacing the BUG_ON() to a WARN(). I have two questions
> > related to this.
> >
> > 1) Is there a fix for the root cause? Can I get a pointer to the commit
> > that claims to address the root cause?
>
> There have been multiple fixes recently on the counters which this code
> depends on, however, nobody has afaict fully analyzed what else those
> fixed.
>
> > 2) Will disabling GSO/TSO make the problem go away? That is, is
> > something related to GSO/TSO at the root of the problem?
>
> Very likely.
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: crash in tcp_fragment
2012-02-27 23:29 ` Tim Hartrick
@ 2012-03-03 7:38 ` Neal Cardwell
0 siblings, 0 replies; 4+ messages in thread
From: Neal Cardwell @ 2012-03-03 7:38 UTC (permalink / raw)
To: tim; +Cc: Ilpo Järvinen, Netdev
Here is a proposed fix I just posted for this issue:
http://patchwork.ozlabs.org/patch/144408/
neal
On Mon, Feb 27, 2012 at 6:29 PM, Tim Hartrick <tim@edgecast.com> wrote:
>
> Ilpo,
>
> Thanks for the answers. This gives me some clarity on how to address
> the problem.
>
>
> Tim Hartrick
>
> On Tue, 2012-02-28 at 01:18 +0200, Ilpo Järvinen wrote:
>> On Sun, 26 Feb 2012, Tim Hartrick wrote:
>>
>> > We have been seeing the crash cited below on a number of our systems
>> > running kernel version 2.6.34, 2.6.36 and 2.6.38 using a number of
>> > different GigE controllers. I note that commit
>> > 2fceec13375e5d98ef033c6b0ee03943fc460950 introduced a band-aid for the
>> > problem by replacing the BUG_ON() to a WARN(). I have two questions
>> > related to this.
>> >
>> > 1) Is there a fix for the root cause? Can I get a pointer to the commit
>> > that claims to address the root cause?
>>
>> There have been multiple fixes recently on the counters which this code
>> depends on, however, nobody has afaict fully analyzed what else those
>> fixed.
>>
>> > 2) Will disabling GSO/TSO make the problem go away? That is, is
>> > something related to GSO/TSO at the root of the problem?
>>
>> Very likely.
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-03-03 7:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-27 2:57 crash in tcp_fragment Tim Hartrick
2012-02-27 23:18 ` Ilpo Järvinen
2012-02-27 23:29 ` Tim Hartrick
2012-03-03 7:38 ` Neal Cardwell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).