From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rob Herring Subject: Re: panics in tcp_ack Date: Sun, 02 Jun 2013 21:23:12 -0500 Message-ID: <51ABFE10.1030206@gmail.com> References: <51ABE067.2050507@gmail.com> <1370219787.24311.113.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from mail-qc0-f181.google.com ([209.85.216.181]:44599 "EHLO mail-qc0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756538Ab3FCCaN (ORCPT ); Sun, 2 Jun 2013 22:30:13 -0400 Received: by mail-qc0-f181.google.com with SMTP id u11so1860423qcx.26 for ; Sun, 02 Jun 2013 19:30:12 -0700 (PDT) In-Reply-To: <1370219787.24311.113.camel@edumazet-glaptop> Sender: netdev-owner@vger.kernel.org List-ID: On 06/02/2013 07:36 PM, Eric Dumazet wrote: > On Sun, 2013-06-02 at 19:16 -0500, Rob Herring wrote: >> Sorry, this time with proper line wrapping... >> >> I'm debugging a kernel panic in the networking stack that happens with a >> cluster (20-40 nodes) of Calxeda highbank (ARM Cortex A9) nodes and >> typically only after 10-24 hours. The node are transferring files >> between nodes over TCP with 20 clients and servers per node. The kernel >> is based on ubuntu 3.5 kernel which is based on 3.5.7.11. So far testing >> has shown that 3.8.11 based (ubuntu raring) kernel is fixed. Attempts to >> bisect have not yielded results as it seems multiple problems mask the >> issue. Perhaps there is some new feature which has indirectly fixed the >> problem in 3.8. >> >> This commit appears to fix a similar panic and seems to reduce the >> frequency after picking it up in the latest 3.5 stable: >> >> commit 16fad69cfe4adbbfa813de516757b87bcae36d93 >> Author: Eric Dumazet >> Date: Thu Mar 14 05:40:32 2013 +0000 >> >> tcp: fix skb_availroom() >> Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack : >> https://code.google.com/p/chromium/issues/detail?id=182056 >> commit a21d45726acac (tcp: avoid order-1 allocations on wifi and tx >> path) did a poor choice adding an 'avail_size' field to skb, while >> what we really needed was a 'reserved_tailroom' one. >> It would have avoided commit 22b4a4f22da (tcp: fix retransmit of >> partially acked frames) and this commit. >> Crash occurs because skb_split() is not aware of the 'avail_size' >> management (and should not be aware) >> Signed-off-by: Eric Dumazet >> Reported-by: Mukesh Agrawal >> Signed-off-by: David S. Miller >> >> I've searched thru 3.8 and 3.9 stable fixes looking for possibly >> relevant commits and applied these commits not in 3.5 stable. However, >> they have not helped: >> >> net: drop dst before queueing fragments >> tcp: call tcp_replace_ts_recent() from tcp_ack() >> tcp: Reallocate headroom if it would overflow csum_start >> tcp: incoming connections might use wrong route under synflood >> > > try also : > > commit 093162553c33e94 (tcp: force a dst refcount when prequeue packet) > commit 0d4f0608619de59 (tcp: dont handle MTU reduction on LISTEN socket) Will add and test. > commit 6731d2095bd4aef (tcp: fix for zero packets_in_flight was too > broad) > commit 2e5f421211ff76c (tcp: frto should not set snd_cwnd to 0) I have these 2. Meanwhile, here's another panic. This one is because struct tcphdr *th is NULL which means skb->head is NULL. The skb is not NULL. <4>[84967.163498] pc : [] lr : [] psr: 600e0013 <4>[84967.163498] sp : ed335cc8 ip : 00000001 fp : 00000400 <4>[84967.174970] r10: ed346e34 r9 : 00000001 r8 : c06d71b8 <4>[84967.180188] r7 : 00000000 r6 : 00000000 r5 : ecd85840 r4 : ecd85840 <4>[84967.186709] r3 : 00000020 r2 : 0000003a r1 : a4051080 r0 : ed346e00 <4>[84967.193234] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user <4>[84967.200365] Control: 10c5387d Table: 2d08804a DAC: 00000015 <0>[84967.206109] Process python (pid: 883, stack limit = 0xed3342f0) <0>[84967.212021] Stack: (0xed335cc8 to 0xed336000) <0>[84967.216373] 5cc0: 000005a8 00000000 ed346e00 c040ac08 c06a5a00 ecd85840 <0>[84967.224549] 5ce0: ed346e00 ed346e00 00000000 c06d71b8 ed346e34 c040eda8 ed346ea0 00000000 <0>[84967.232720] 5d00: 00000000 00000000 e9805380 0000000a 0000001c ecd85840 00000000 ed346e00 <0>[84967.240897] 5d20: 00000000 c03b1d78 e9805380 ed346e00 0000fe88 3a61054b 00000400 00df2c34 <0>[84967.249075] 5d40: 00000040 c03fd2b8 0000a400 edf8c840 ed335eb0 ed335ed8 c23212f0 c23212e0 <0>[84967.257249] 5d60: 00df2c34 c17720e0 0000000e 00000400 00000400 000005a8 00000040 ed346ea0 <0>[84967.265419] 5d80: 00000000 00000000 ed334000 00000001 00010e30 00000630 00000000 00000000 <0>[84967.273591] 5da0: 0000000e 0000fe88 00000000 c06d6040 c2aeb380 ed346e00 ed335e30 eca26000 <0>[84967.281763] 5dc0: ed335ed8 00000400 00df2834 00000000 00000003 c041ea58 c795c2e8 ed4ecb50 <0>[84967.289935] 5de0: 00000000 ed335df0 eca26000 c03aef74 51ab6eeb 263fddc0 00000000 00000400 <0>[84967.298105] 5e00: eca26000 00000000 00000000 ed335ed8 01d0d6eb c00cb4d8 00000056 00000000 <0>[84967.306294] 5e20: 91827364 ed335e24 00001000 00000001 ed9b4050 00000000 00000000 00000001 <0>[84967.314472] 5e40: ffffffff 00000000 00000000 00000000 00000000 00000000 ecc3de80 00000001 <0>[84967.322642] 5e60: 00000000 00000000 00001000 00000000 ed335df0 00000000 00001000 c0012f28 <0>[84967.330812] 5e80: fee00100 0002c000 00000000 ed335f88 ed9b4000 fffffdee ed334000 00000001 <0>[84967.338983] 5ea0: b6ae35f8 c010aa38 0002c000 00000000 00000400 eca26000 c06a4508 00000000 <0>[84967.347152] 5ec0: 00000040 c03b07d4 fffffff7 00000000 00df2834 00000400 00000000 00000000 <0>[84967.355321] 5ee0: ed335ed0 00000001 00000000 00000000 00000040 00000000 00000000 c0223254 <0>[84967.363495] 5f00: 00001000 00000000 00001000 00000000 00000001 ed9b4008 600e0013 ffffffff <0>[84967.371666] 5f20: c000dbc4 c06ff504 ffffffff 00000000 00014be7 03614c11 ed335f90 00000000 <0>[84967.379858] 5f40: 0000000a ed335f68 c000dd28 ed334000 00000000 00000003 0000000a 0000000a <0>[84967.388032] 5f60: 00000000 0002c000 00014bf1 00002710 00000001 271ae81b b6aecd90 00000000 <0>[84967.396203] 5f80: 00d25050 00000121 c000dd28 ed334000 00000000 c03b0828 00000000 00000000 <0>[84967.404376] 5fa0: be8f2890 c000db60 b6aecd90 00000000 00000006 00df2834 00000400 00000000 <0>[84967.412547] 5fc0: b6aecd90 00000000 00d25050 00000121 00000400 00df2834 b6ad4fd0 00000003 <0>[84967.420719] 5fe0: 00000000 be8f289c 000a5505 b6f7398c 600e0010 00000006 00000000 00000000 <4>[84967.428912] [] (tcp_rcv_established+0x20/0x5e0) from [] (tcp_v4_do_rcv+0xf0/0x2cc) <4>[84967.438252] [] (tcp_v4_do_rcv+0xf0/0x2cc) from [] (release_sock+0x84/0xfc) <4>[84967.446900] [] (release_sock+0x84/0xfc) from [] (tcp_sendmsg+0x378/0xcdc) <4>[84967.455439] [] (tcp_sendmsg+0x378/0xcdc) from [] (inet_sendmsg+0x80/0xb8) <4>[84967.463966] [] (inet_sendmsg+0x80/0xb8) from [] (sock_sendmsg+0xcc/0xec) <4>[84967.472404] [] (sock_sendmsg+0xcc/0xec) from [] (sys_sendto+0xc0/0xfc) <4>[84967.480670] [] (sys_sendto+0xc0/0xfc) from [] (sys_send+0x18/0x20) <4>[84967.488599] [] (sys_send+0x18/0x20) from [] (ret_fast_syscall+0x0/0x30) Rob