From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: Fw: [Bug 14470] New: freez in TCP stack Date: Thu, 29 Oct 2009 06:35:08 +0100 Message-ID: <4AE9298C.1000204@gmail.com> References: <20091026084132.57bc3d07@nehalam> <20091028151313.ba4a4d23.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Stephen Hemminger , netdev@vger.kernel.org, kolo@albatani.cz, bugzilla-daemon@bugzilla.kernel.org To: Andrew Morton Return-path: Received: from gw1.cosmosbay.com ([212.99.114.194]:38291 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752248AbZJ2FfN (ORCPT ); Thu, 29 Oct 2009 01:35:13 -0400 In-Reply-To: <20091028151313.ba4a4d23.akpm@linux-foundation.org> Sender: netdev-owner@vger.kernel.org List-ID: Andrew Morton a =E9crit : > On Mon, 26 Oct 2009 08:41:32 -0700 > Stephen Hemminger wrote: >=20 >> >> Begin forwarded message: >> >> Date: Mon, 26 Oct 2009 12:47:22 GMT >> From: bugzilla-daemon@bugzilla.kernel.org >> To: shemminger@linux-foundation.org >> Subject: [Bug 14470] New: freez in TCP stack >> >=20 > Stephen, please retain the bugzilla and reporter email cc's when > forwarding a report to a mailing list. >=20 >=20 >> http://bugzilla.kernel.org/show_bug.cgi?id=3D14470 >> >> Summary: freez in TCP stack >> Product: Networking >> Version: 2.5 >> Kernel Version: 2.6.31 >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: high >> Priority: P1 >> Component: IPV4 >> AssignedTo: shemminger@linux-foundation.org >> ReportedBy: kolo@albatani.cz >> Regression: No >> >> >> We are hiting kernel panics on Dell R610 servers with e1000e NICs; i= t apears >> usualy under a high network trafic ( around 100Mbit/s) but it is not= a rule it >> has happened even on low trafic. >> >> Servers are used as reverse http proxy (varnish). >> >> On 6 equal servers this panic happens aprox 2 times a day depending = on network >> load. Machine completly freezes till the management watchdog reboots= =2E=20 >> >=20 > Twice a day on six separate machines. That ain't no hardware glitch. >=20 > Vaclav, are you able to say whether this is a regression? Did those > machines run 2.6.30 (for example)? >=20 > Thanks. >=20 >> We had to put serial console on these servers to catch the oops. Is = there >> anything else We can do to debug this? >> The RIP is always the same: >> >> RIP: 0010:[] [] >> tcp_xmit_retransmit_queue+0x8c/0x290 >> >> rest of the oops always differs a litle ... here is an example: >> >> RIP: 0010:[] [] >> tcp_xmit_retransmit_queue+0x8c/0x290 >> RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 >> RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 >> RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 >> RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >> R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 >> FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:00000000= 00000000 >> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >> CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffffffff81608000, task ffffffff8= 1631440) >> Stack: >> ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 >> <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000= 000 >> <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000= 000 >> Call Trace: >> >> [] tcp_ack+0x170a/0x1dd0 >> [] tcp_rcv_state_process+0x122/0xab0 >> [] tcp_v4_do_rcv+0xac/0x220 >> [] ? nf_iterate+0x5f/0x90 >> [] tcp_v4_rcv+0x586/0x6b0 >> [] ? nf_hook_slow+0x65/0xf0 >> [] ? ip_local_deliver_finish+0x0/0x120 >> [] ip_local_deliver_finish+0x5f/0x120 >> [] ip_local_deliver+0x3b/0x90 >> [] ip_rcv_finish+0x141/0x340 >> [] ip_rcv+0x24f/0x350 >> [] netif_receive_skb+0x20d/0x2f0 >> [] napi_skb_finish+0x40/0x50 >> [] napi_gro_receive+0x34/0x40 >> [] e1000_receive_skb+0x48/0x60 >> [] e1000_clean_rx_irq+0xf2/0x330 >> [] e1000_clean+0x81/0x2a0 >> [] ? ktime_get+0x11/0x50 >> [] net_rx_action+0x9c/0x130 >> [] ? get_next_timer_interrupt+0x1d0/0x210 >> [] __do_softirq+0xb7/0x160 >> [] call_softirq+0x1c/0x30 >> [] do_softirq+0x3d/0x80 >> [] irq_exit+0x7b/0x90 >> [] do_IRQ+0x73/0xe0 >> [] ret_from_intr+0x0/0xa >> >> [] ? acpi_idle_enter_bm+0x245/0x271 >> [] ? acpi_idle_enter_bm+0x23b/0x271 >> [] ? cpuidle_idle_call+0x98/0xf0 >> [] ? cpu_idle+0x94/0xd0 >> [] ? rest_init+0x66/0x70 >> [] ? start_kernel+0x2ef/0x340 >> [] ? x86_64_start_reservations+0x84/0x90 >> [] ? x86_64_start_kernel+0xd2/0x100 >> Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41= 0f b6 cd >> 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 = 24 4d 39 f4 >> 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 >> RIP [] tcp_xmit_retransmit_queue+0x8c/0x290 >> RSP >> CR2: 0000000000000000 >> ---[ end trace d97d99c9ae1d52cc ]--- >> Kernel panic - not syncing: Fatal exception in interrupt >> Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 >> Call Trace: >> [] panic+0xa0/0x170 >> [] ? ret_from_intr+0x0/0xa >> [] ? print_oops_end_marker+0x1e/0x20 >> [] oops_end+0x9e/0xb0 >> [] no_context+0x15a/0x250 >> [] __bad_area_nosemaphore+0xdb/0x1c0 >> [] ? dev_hard_start_xmit+0x269/0x2f0 >> [] bad_area_nosemaphore+0xe/0x10 >> [] do_page_fault+0x17f/0x260 >> [] page_fault+0x1f/0x30 >> [] ? tcp_xmit_retransmit_queue+0x8c/0x290 >> [] tcp_ack+0x170a/0x1dd0 >> [] tcp_rcv_state_process+0x122/0xab0 >> [] tcp_v4_do_rcv+0xac/0x220 >> [] ? nf_iterate+0x5f/0x90 >> [] tcp_v4_rcv+0x586/0x6b0 >> [] ? nf_hook_slow+0x65/0xf0 >> [] ? ip_local_deliver_finish+0x0/0x120 >> [] ip_local_deliver_finish+0x5f/0x120 >> [] ip_local_deliver+0x3b/0x90 >> [] ip_rcv_finish+0x141/0x340 >> [] ip_rcv+0x24f/0x350 >> [] netif_receive_skb+0x20d/0x2f0 >> [] napi_skb_finish+0x40/0x50 >> [] napi_gro_receive+0x34/0x40 >> [] e1000_receive_skb+0x48/0x60 >> [] e1000_clean_rx_irq+0xf2/0x330 >> [] e1000_clean+0x81/0x2a0 >> [] ? ktime_get+0x11/0x50 >> [] net_rx_action+0x9c/0x130 >> [] ? get_next_timer_interrupt+0x1d0/0x210 >> [] __do_softirq+0xb7/0x160 >> [] call_softirq+0x1c/0x30 >> [] do_softirq+0x3d/0x80 >> [] irq_exit+0x7b/0x90 >> [] do_IRQ+0x73/0xe0 >> [] ret_from_intr+0x0/0xa >> [] ? acpi_idle_enter_bm+0x245/0x271 >> [] ? acpi_idle_enter_bm+0x23b/0x271 >> [] ? cpuidle_idle_call+0x98/0xf0 >> [] ? cpu_idle+0x94/0xd0 >> [] ? rest_init+0x66/0x70 >> [] ? start_kernel+0x2ef/0x340 >> [] ? x86_64_start_reservations+0x84/0x90 >> [] ? x86_64_start_kernel+0xd2/0x100 >> Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 cmp %eax,0x40(%r12) 0f 89 00 01 00 00 jns ... 41 0f b6 cd movzbl %r13b,%ecx 41 bd 2f 00 00 00 mov $0x2f000000,%r13d 83 e1 03 and $0x3,%ecx 0f 84 fc 00 00 00 je ... 4d 8b 24 24 mov (%r12),%r12 skb =3D skb->next <>49 8b 04 24 mov (%r12),%rax << NULL POINTER dereference = >> 4d 39 f4 cmp %r14,%r12 0f 18 08 prefetcht0 (%rax) 0f 84 d9 00 00 00 je ... 4c 3b a3 b8 01 cmp crash is in=20 void tcp_xmit_retransmit_queue(struct sock *sk) { << HERE >> tcp_for_write_queue_from(skb, sk) { } Some skb in sk_write_queue has a NULL ->next pointer Strange thing is R14 and RAX =3Dffff8807e7420678 (&sk->sk_write_queue)= =20 R14 is the stable value during the loop, while RAW is scratch register. I dont have full disassembly for this function, but I guess we just ent= ered the loop (or RAX should be really different at this point) So, maybe list head itself is corrupted (sk->sk_write_queue->next =3D N= ULL) or, retransmit_skb_hint problem ? (we forget to set it to NULL in some = cases ?)