netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* crash in tcp_fragment
@ 2012-02-27  2:57 Tim Hartrick
  2012-02-27 23:18 ` Ilpo Järvinen
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Hartrick @ 2012-02-27  2:57 UTC (permalink / raw)
  To: netdev



Netdev,

	We have been seeing the crash cited below on a number of our systems
running kernel version 2.6.34, 2.6.36 and 2.6.38 using a number of
different GigE controllers.  I note that commit
2fceec13375e5d98ef033c6b0ee03943fc460950 introduced a band-aid for the
problem by replacing the BUG_ON() to a WARN().  I have two questions
related to this.

1) Is there a fix for the root cause?  Can I get a pointer to the commit
that claims to address the root cause?

2) Will disabling GSO/TSO make the problem go away?  That is, is
something related to GSO/TSO at the root of the problem?


Thanks

Tim Hartrick

PID: 0      TASK: ffff880bff2e5b80  CPU: 2   COMMAND: "kworker/0:1"
 #0 [ffff880c2fc23580] machine_kexec at ffffffff81032b49
 #1 [ffff880c2fc235f0] crash_kexec at ffffffff810ac042
 #2 [ffff880c2fc236c0] oops_end at ffffffff815d6338
 #3 [ffff880c2fc236f0] die at ffffffff8100fd0b
 #4 [ffff880c2fc23720] do_trap at ffffffff815d5c14
 #5 [ffff880c2fc23780] do_invalid_op at ffffffff8100d9a5
 #6 [ffff880c2fc23820] invalid_op at ffffffff8100ccdb
    [exception RIP: tcp_fragment+818]
    RIP: ffffffff8152fac2  RSP: ffff880c2fc238d0  RFLAGS: 00010287
    RAX: 0000000000000007  RBX: ffff880b35f10000  RCX: 00000000000005b0
    RDX: 00000000000027d0  RSI: ffff880b35f10000  RDI: ffff88084946ce00
    RBP: ffff880c2fc23920   R8: 00000000000027d0   R9: 00000000a5041694
    R10: dead000000200200  R11: 0000000000000000  R12: 0000000000002230
    R13: ffff88084946ce00  R14: 0000000000000016  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff880c2fc23928] tcp_mark_head_lost at ffffffff815254f6
 #8 [ffff880c2fc23978] tcp_update_scoreboard at ffffffff8152560b
 #9 [ffff880c2fc23998] tcp_fastretrans_alert at ffffffff8152a5ca
#10 [ffff880c2fc239e8] tcp_ack at ffffffff8152c464
#11 [ffff880c2fc23a58] tcp_rcv_established at ffffffff8152d1b0
#12 [ffff880c2fc23aa8] tcp_v4_do_rcv at ffffffff815353b5
#13 [ffff880c2fc23ad8] tcp_v4_rcv at ffffffff81536ba9
#14 [ffff880c2fc23b58] ip_local_deliver_finish at ffffffff8151372d
#15 [ffff880c2fc23b88] ip_local_deliver at ffffffff81513960
#16 [ffff880c2fc23bb8] ip_rcv_finish at ffffffff81512f31
#17 [ffff880c2fc23be8] ip_rcv at ffffffff8151357d
#18 [ffff880c2fc23c28] __netif_receive_skb at ffffffff814dd24a
#19 [ffff880c2fc23ca8] netif_receive_skb at ffffffff814e2910
#20 [ffff880c2fc23ce8] napi_skb_finish at ffffffff814e2a70
#21 [ffff880c2fc23d08] napi_gro_receive at ffffffff814e2f05
#22 [ffff880c2fc23d28] bnx2_rx_int at ffffffffa013866a
#23 [ffff880c2fc23df8] bnx2_poll_work at ffffffffa0138b10
#24 [ffff880c2fc23e28] bnx2_poll_msix at ffffffffa0138b7d
#25 [ffff880c2fc23e68] net_rx_action at ffffffff814e30e8
#26 [ffff880c2fc23ec8] __do_softirq at ffffffff8106beeb
#27 [ffff880c2fc23f38] call_softirq at ffffffff8100cf5c
#28 [ffff880c2fc23f50] do_softirq at ffffffff8100e9c5
#29 [ffff880c2fc23f70] irq_exit at ffffffff8106bdb5
#30 [ffff880c2fc23f80] do_IRQ at ffffffff815dcf66
--- <IRQ stack> ---
#31 [ffff880bff2efda0] ret_from_intr at ffffffff815d5393
    RIP: ffffffffffffff73  RSP: 0000000000000202  RFLAGS: 00000010
    RAX: 00000000fffffffd  RBX: ffff880bff2efea8  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000489  RDI: 0000000000000000
    RBP: ffffffff815d538e   R8: 0000000000000320   R9: 0000000000000001
    R10: 0000000000000000  R11: ffff88019462adc0  R12: ffff880bff2efe18
    R13: ffffffff81051f50  R14: ffff880bff2efdc8  R15: ffffffff812de426
    ORIG_RAX: 000000000011bae9  CS: ffffffff81332a71  SS: ffff880bff2efe58

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: crash in tcp_fragment
  2012-02-27  2:57 crash in tcp_fragment Tim Hartrick
@ 2012-02-27 23:18 ` Ilpo Järvinen
  2012-02-27 23:29   ` Tim Hartrick
  0 siblings, 1 reply; 4+ messages in thread
From: Ilpo Järvinen @ 2012-02-27 23:18 UTC (permalink / raw)
  To: Tim Hartrick; +Cc: Netdev

On Sun, 26 Feb 2012, Tim Hartrick wrote:

> 	We have been seeing the crash cited below on a number of our systems
> running kernel version 2.6.34, 2.6.36 and 2.6.38 using a number of
> different GigE controllers.  I note that commit
> 2fceec13375e5d98ef033c6b0ee03943fc460950 introduced a band-aid for the
> problem by replacing the BUG_ON() to a WARN().  I have two questions
> related to this.
> 
> 1) Is there a fix for the root cause?  Can I get a pointer to the commit
> that claims to address the root cause?

There have been multiple fixes recently on the counters which this code 
depends on, however, nobody has afaict fully analyzed what else those 
fixed.

> 2) Will disabling GSO/TSO make the problem go away?  That is, is
> something related to GSO/TSO at the root of the problem?

Very likely.

-- 
 i.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: crash in tcp_fragment
  2012-02-27 23:18 ` Ilpo Järvinen
@ 2012-02-27 23:29   ` Tim Hartrick
  2012-03-03  7:38     ` Neal Cardwell
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Hartrick @ 2012-02-27 23:29 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev


Ilpo,

Thanks for the answers.  This gives me some clarity on how to address
the problem.


Tim Hartrick

On Tue, 2012-02-28 at 01:18 +0200, Ilpo Järvinen wrote:
> On Sun, 26 Feb 2012, Tim Hartrick wrote:
> 
> > 	We have been seeing the crash cited below on a number of our systems
> > running kernel version 2.6.34, 2.6.36 and 2.6.38 using a number of
> > different GigE controllers.  I note that commit
> > 2fceec13375e5d98ef033c6b0ee03943fc460950 introduced a band-aid for the
> > problem by replacing the BUG_ON() to a WARN().  I have two questions
> > related to this.
> > 
> > 1) Is there a fix for the root cause?  Can I get a pointer to the commit
> > that claims to address the root cause?
> 
> There have been multiple fixes recently on the counters which this code 
> depends on, however, nobody has afaict fully analyzed what else those 
> fixed.
> 
> > 2) Will disabling GSO/TSO make the problem go away?  That is, is
> > something related to GSO/TSO at the root of the problem?
> 
> Very likely.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: crash in tcp_fragment
  2012-02-27 23:29   ` Tim Hartrick
@ 2012-03-03  7:38     ` Neal Cardwell
  0 siblings, 0 replies; 4+ messages in thread
From: Neal Cardwell @ 2012-03-03  7:38 UTC (permalink / raw)
  To: tim; +Cc: Ilpo Järvinen, Netdev

Here is a proposed fix I just posted for this issue:

  http://patchwork.ozlabs.org/patch/144408/

neal

On Mon, Feb 27, 2012 at 6:29 PM, Tim Hartrick <tim@edgecast.com> wrote:
>
> Ilpo,
>
> Thanks for the answers.  This gives me some clarity on how to address
> the problem.
>
>
> Tim Hartrick
>
> On Tue, 2012-02-28 at 01:18 +0200, Ilpo Järvinen wrote:
>> On Sun, 26 Feb 2012, Tim Hartrick wrote:
>>
>> >     We have been seeing the crash cited below on a number of our systems
>> > running kernel version 2.6.34, 2.6.36 and 2.6.38 using a number of
>> > different GigE controllers.  I note that commit
>> > 2fceec13375e5d98ef033c6b0ee03943fc460950 introduced a band-aid for the
>> > problem by replacing the BUG_ON() to a WARN().  I have two questions
>> > related to this.
>> >
>> > 1) Is there a fix for the root cause?  Can I get a pointer to the commit
>> > that claims to address the root cause?
>>
>> There have been multiple fixes recently on the counters which this code
>> depends on, however, nobody has afaict fully analyzed what else those
>> fixed.
>>
>> > 2) Will disabling GSO/TSO make the problem go away?  That is, is
>> > something related to GSO/TSO at the root of the problem?
>>
>> Very likely.
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-03-03  7:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-27  2:57 crash in tcp_fragment Tim Hartrick
2012-02-27 23:18 ` Ilpo Järvinen
2012-02-27 23:29   ` Tim Hartrick
2012-03-03  7:38     ` Neal Cardwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).