From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rob Herring Subject: Re: panics in tcp_ack Date: Mon, 03 Jun 2013 08:05:29 -0500 Message-ID: <51AC9499.8070207@gmail.com> References: <51ABE067.2050507@gmail.com> <1370219787.24311.113.camel@edumazet-glaptop> <51ABFE10.1030206@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from mail-qe0-f49.google.com ([209.85.128.49]:44824 "EHLO mail-qe0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757475Ab3FCNFc (ORCPT ); Mon, 3 Jun 2013 09:05:32 -0400 Received: by mail-qe0-f49.google.com with SMTP id a11so2291429qen.22 for ; Mon, 03 Jun 2013 06:05:31 -0700 (PDT) In-Reply-To: <51ABFE10.1030206@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On 06/02/2013 09:23 PM, Rob Herring wrote: > On 06/02/2013 07:36 PM, Eric Dumazet wrote: >> On Sun, 2013-06-02 at 19:16 -0500, Rob Herring wrote: >>> Sorry, this time with proper line wrapping... >>> >>> I'm debugging a kernel panic in the networking stack that happens with a >>> cluster (20-40 nodes) of Calxeda highbank (ARM Cortex A9) nodes and >>> typically only after 10-24 hours. The node are transferring files >>> between nodes over TCP with 20 clients and servers per node. The kernel >>> is based on ubuntu 3.5 kernel which is based on 3.5.7.11. So far testing >>> has shown that 3.8.11 based (ubuntu raring) kernel is fixed. Attempts to >>> bisect have not yielded results as it seems multiple problems mask the >>> issue. Perhaps there is some new feature which has indirectly fixed the >>> problem in 3.8. >>> >>> This commit appears to fix a similar panic and seems to reduce the >>> frequency after picking it up in the latest 3.5 stable: >>> >>> commit 16fad69cfe4adbbfa813de516757b87bcae36d93 >>> Author: Eric Dumazet >>> Date: Thu Mar 14 05:40:32 2013 +0000 >>> >>> tcp: fix skb_availroom() >>> Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack : >>> https://code.google.com/p/chromium/issues/detail?id=182056 >>> commit a21d45726acac (tcp: avoid order-1 allocations on wifi and tx >>> path) did a poor choice adding an 'avail_size' field to skb, while >>> what we really needed was a 'reserved_tailroom' one. >>> It would have avoided commit 22b4a4f22da (tcp: fix retransmit of >>> partially acked frames) and this commit. >>> Crash occurs because skb_split() is not aware of the 'avail_size' >>> management (and should not be aware) >>> Signed-off-by: Eric Dumazet >>> Reported-by: Mukesh Agrawal >>> Signed-off-by: David S. Miller >>> >>> I've searched thru 3.8 and 3.9 stable fixes looking for possibly >>> relevant commits and applied these commits not in 3.5 stable. However, >>> they have not helped: >>> >>> net: drop dst before queueing fragments >>> tcp: call tcp_replace_ts_recent() from tcp_ack() >>> tcp: Reallocate headroom if it would overflow csum_start >>> tcp: incoming connections might use wrong route under synflood >>> >> >> try also : >> >> commit 093162553c33e94 (tcp: force a dst refcount when prequeue packet) >> commit 0d4f0608619de59 (tcp: dont handle MTU reduction on LISTEN socket) > > Will add and test. > >> commit 6731d2095bd4aef (tcp: fix for zero packets_in_flight was too >> broad) >> commit 2e5f421211ff76c (tcp: frto should not set snd_cwnd to 0) > > I have these 2. Ran overnight with the 2 additional patches. One panic after ~9 hours running on 75 nodes. <4>[30632.185861] [] (tcp_ack+0x79c/0x1014) from [] (tcp_rcv_established+0x348/0x5e0) <4>[30632.194903] [] (tcp_rcv_established+0x348/0x5e0) from [] (tcp_v4_do_rcv+0xf0/0x2cc) <4>[30632.204291] [] (tcp_v4_do_rcv+0xf0/0x2cc) from [] (tcp_v4_rcv+0x834/0x918) <4>[30632.212900] [] (tcp_v4_rcv+0x834/0x918) from [] (ip_local_deliver_finish+0xe8/0x33c) <4>[30632.222376] [] (ip_local_deliver_finish+0xe8/0x33c) from [] (ip_rcv_finish+0x140/0x4c0) <4>[30632.232115] [] (ip_rcv_finish+0x140/0x4c0) from [] (__netif_receive_skb+0x5e0/0x690) <4>[30632.241590] [] (__netif_receive_skb+0x5e0/0x690) from [] (netif_receive_skb+0x1c/0x90) <4>[30632.251240] [] (netif_receive_skb+0x1c/0x90) from [] (napi_skb_finish+0x54/0x78) <4>[30632.260371] [] (napi_skb_finish+0x54/0x78) from [] (xgmac_poll+0x3ac/0x4ec) <4>[30632.269066] [] (xgmac_poll+0x3ac/0x4ec) from [] (net_rx_action+0x140/0x228) <4>[30632.277761] [] (net_rx_action+0x140/0x228) from [] (__do_softirq+0xb4/0x1cc) <4>[30632.286541] [] (__do_softirq+0xb4/0x1cc) from [] (irq_exit+0x80/0x88) <4>[30632.294716] [] (irq_exit+0x80/0x88) from [] (handle_IRQ+0x50/0xb0) <4>[30632.302629] [] (handle_IRQ+0x50/0xb0) from [] (gic_handle_irq+0x24/0x58) <4>[30632.311062] [] (gic_handle_irq+0x24/0x58) from [] (__irq_svc+0x40/0x50) <4>[30632.319402] Exception stack(0xeca4dc10 to 0xeca4dc58) <4>[30632.324445] dc00: c2f7a580 02000020 02000000 00000000 <4>[30632.332615] dc20: c2f7a580 e9e4f33c e9e4f34c 00000000 ec185300 00001000 00000000 00001000 <4>[30632.340783] dc40: 00000001 eca4dc58 c0136cbc c0136cd4 200f0013 ffffffff <4>[30632.347398] [] (__irq_svc+0x40/0x50) from [] (__set_page_dirty+0x80/0xc0) <4>[30632.355919] [] (__set_page_dirty+0x80/0xc0) from [] (__block_commit_write+0xb4/0xe0) <4>[30632.365394] [] (__block_commit_write+0xb4/0xe0) from [] (block_write_end+0x4c/0x84) <4>[30632.374782] [] (block_write_end+0x4c/0x84) from [] (generic_write_end+0x34/0xb0) <4>[30632.383911] [] (generic_write_end+0x34/0xb0) from [] (ext4_da_write_end+0xa4/0x340) <4>[30632.393303] [] (ext4_da_write_end+0xa4/0x340) from [] (generic_file_buffered_write+0xe0/0x25 8) <4>[30632.403648] [] (generic_file_buffered_write+0xe0/0x258) from [] (__generic_file_aio_write+0x 274/0x4bc) <4>[30632.414684] [] (__generic_file_aio_write+0x274/0x4bc) from [] (generic_file_aio_write+0x5c/0 xc8) <4>[30632.425201] [] (generic_file_aio_write+0x5c/0xc8) from [] (ext4_file_write+0xcc/0x2a0) <4>[30632.434853] [] (ext4_file_write+0xcc/0x2a0) from [] (do_sync_write+0xa8/0xe8) <4>[30632.443722] [] (do_sync_write+0xa8/0xe8) from [] (vfs_write+0x9c/0x170) <4>[30632.452069] [] (vfs_write+0x9c/0x170) from [] (sys_write+0x38/0x70) <4>[30632.460068] [] (sys_write+0x38/0x70) from [] (ret_fast_syscall+0x0/0x30) The full stack looks like this: include/linux/skbuff.h:__skb_unlink include/net/tcp.h:tcp_unlink_write_queue net/ipv4/tcp_input.c:tcp_clean_rtx_queue net/ipv4/tcp_input.c:tcp_ack This panic is in __skb_unlink with the skb prev ptr being NULL. Here's the disassembly: if (!fully_acked) c04070cc: e3520000 cmp r2, #0 c04070d0: 0afffecb beq c0406c04 extern void skb_unlink(struct sk_buff *skb, struct sk_buff_head *list); static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head *list) { struct sk_buff *next, *prev; list->qlen--; c04070d4: e59430a8 ldr r3, [r4, #168] ; 0xa8 static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) { sock_set_flag(sk, SOCK_QUEUE_SHRUNK); sk->sk_wmem_queued -= skb->truesize; sk_mem_uncharge(sk, skb->truesize); __kfree_skb(skb); c04070d8: e1a00005 mov r0, r5 c04070dc: e2433001 sub r3, r3, #1 c04070e0: e58430a8 str r3, [r4, #168] ; 0xa8 next = skb->next; prev = skb->prev; c04070e4: e895000c ldm r5, {r2, r3} skb->next = skb->prev = NULL; c04070e8: e5859000 str r9, [r5] c04070ec: e5859004 str r9, [r5, #4] next->prev = prev; c04070f0: e5823004 str r3, [r2, #4] prev->next = next; c04070f4: e5832000 str r2, [r3] Rob