From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willy Tarreau Subject: Re: GRO + splice panics in 3.7.0-rc5 Date: Sat, 1 Dec 2012 20:43:04 +0100 Message-ID: <20121201194304.GI25450@1wt.eu> References: <20121115222812.GA647@1wt.eu> <1353023344.10798.8.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from 1wt.eu ([62.212.114.60]:36366 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752873Ab2LATnM (ORCPT ); Sat, 1 Dec 2012 14:43:12 -0500 Content-Disposition: inline In-Reply-To: <1353023344.10798.8.camel@edumazet-glaptop> Sender: netdev-owner@vger.kernel.org List-ID: Hi Eric, On Thu, Nov 15, 2012 at 03:49:04PM -0800, Eric Dumazet wrote: > On Thu, 2012-11-15 at 23:28 +0100, Willy Tarreau wrote: > > Hello, > > > > I was just about to make a quick comparison between LRO and GRO in > > 3.7.0-rc5 to see if LRO still had the big advantage I've always observed, > > but I failed the test because as soon as I enable LRO + splice, the kernel > > panics and reboots. > > > > I could not yet manage to catch the panic output, I could just reliably > > reproduce it, it crashes instantly. > > > > All I can say at the moment is the following : > > - test consist in forwarding HTTP traffic between two NICs via haproxy > > - driver used was myri10ge > > - LRO + recv+send : OK > > - LRO + splice : OK > > - GRO + recv+send : OK > > - GRO + splice : panic > > - no such problem was observed in 3.6.6 so I think this is a recent > > regression. > > > > I'll go back digging for more information, but as I'm used to often see > > Eric suggest the right candidates for reverting, I wanted to report the > > issue here in case there are easy ones to try first. > > Hi Willy > > Nothing particular comes to mind, there were a lot of recent changes > that could trigger this kind of bug. > > A stack trace would be useful of course ;) First, sorry for the long delay, I would really have liked to debug this earlier. I could finally bisect the regression to this commit : 69b08f62e17439ee3d436faf0b9a7ca6fffb78db is the first bad commit commit 69b08f62e17439ee3d436faf0b9a7ca6fffb78db Author: Eric Dumazet Date: Wed Sep 26 06:46:57 2012 +0000 net: use bigger pages in __netdev_alloc_frag The regression is still present in latest git (7c17e486), and goes away if I revert this patch. I get this trace both on my 64-bit 10Gig machines with a myri10ge NIC, and on my 32-bit eeepc with an atl1c NIC at 100 Mbps, so this is not related to the driver nor the architecture. The trace looks like this on my eeepc, it crashes in tcp_sendpage() (more precisely in do_tcp_sendpages() which was inlined in the former) : BUG: unable to handle kernel NULL pointer dereference at 00000b50 IP: [] tcp_sendpage+0x28c/0x5d0 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss uvcvideo videobuf2_core videobuf2_vmalloc videobuf2_memops ath9k mac80211 snd_hda_codec_realtek microcode ath9k_common psmouse ath9k_hw serio_raw uhci_hcd ath sg cfg80211 snd_hda_intel lpc_ich ehci_hcd mfd_core snd_hda_codec atl1c snd_hwdep rtc_cmos snd_pcm snd_timer snd snd_page_alloc eeepc_laptop sparse_keymap evdev Pid: 2880, comm: haproxy Not tainted 3.7.0-rc7-eeepc #1 ASUSTeK Computer INC. 1005HA/1005HA EIP: 0060:[] EFLAGS: 00210246 CPU: 0 EIP is at tcp_sendpage+0x28c/0x5d0 EAX: c16a4138 EBX: f5202500 ECX: 00000b50 EDX: f5a72f68 ESI: f5333cc0 EDI: 000005a8 EBP: f55b7e34 ESP: f55b7df0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: 00000b50 CR3: 35556000 CR4: 000007d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process haproxy (pid: 2880, ti=f55b6000 task=f62060c0 task.ti=f55b6000) Stack: 00000000 00000002 00000000 00000582 02000000 f52025a0 000005a8 00000002 00000b50 00001582 00000000 f6eb3900 00000b50 00000000 f5202500 c13fdd10 00000040 f55b7e5c c14201c7 000005a8 00000040 f55b7ecc f6eb3900 f5fc1680 Call Trace: [] ? sk_stream_alloc_skb+0xf0/0xf0 [] inet_sendpage+0x57/0xc0 [] ? inet_dgram_connect+0x80/0x80 [] kernel_sendpage+0x29/0x50 [] sock_sendpage+0x28/0x30 [] ? kernel_sendpage+0x50/0x50 [] pipe_to_sendpage+0x65/0x80 [] splice_from_pipe_feed+0x77/0x150 [] ? splice_from_pipe_begin+0x10/0x10 [] __splice_from_pipe+0x54/0x60 [] ? splice_from_pipe_begin+0x10/0x10 [] splice_from_pipe+0x50/0x70 [] ? default_file_splice_write+0x50/0x50 [] generic_splice_sendpage+0x22/0x30 [] ? splice_from_pipe_begin+0x10/0x10 [] do_splice_from+0x66/0x90 [] sys_splice+0x42f/0x480 [] syscall_call+0x7/0xb [] ? blk_iopoll_cpu_notify+0x2c/0x5d Code: 7d 08 0f 47 7d 08 39 f8 0f 4e f8 8b 43 1c 8b 40 60 85 c0 74 09 39 7b 64 0f 8c 21 01 00 00 80 7d ce 00 0f 85 ef 00 00 00 8b 4d dc <8b> 01 f6 c4 80 0f 85 0e 03 00 00 8b 45 dc f0 ff 40 10 8b 55 d8 EIP: [] tcp_sendpage+0x28c/0x5d0 SS:ESP 0068:f55b7df0 CR2: 0000000000000b50 ---[ end trace a07522433caf9655 ]--- In order to reproduce it, I'm just running haproxy with TCP splicing enabled. It simply forwards data from one TCP socket to another one. In case this matters, here's what the sequence looks like when it works (just relevant syscalls). fd 7 = client retrieving data, fd 8 = server providing data : socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 8 fcntl64(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0 connect(8, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("10.8.3.1")}, 16) = -1 EINPROGRESS (Operation now in progress) send(8, "GET /?s=100k HTTP/1.1\r\nConnection"..., 120, MSG_DONTWAIT|MSG_NOSIGNAL) = 120 pipe([9, 10]) = 0 splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = 5792 splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = 1448 splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = -1 EAGAIN (Resource temporarily unavailable) splice(0x9, 0, 0x7, 0, 0x1c48, 0x3) = 7240 And the splice() block above repeats over and over with varying sizes. It's worth mentionning that the issue does not happend when recv/send are used instead of splice(). Changing the pipe size using F_SETPIPE_SZ does not change the behaviour either (was using 1 MB when it first appeared and thought it was the cause). I managed to catch the oops with strace running on the process. It crashed on the first splice() call on the sending side and confirms the stack trace : pipe([9, 10]) = 0 splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = 10136 splice(0x9, 0, 0x7, 0, 0x2798, 0x3) ---> never completes, oops here. Are there any additional information I can provide to help debug this ? Seeing what the patch does, I don't think it's the direct responsible but that it triggers an existing bug somewhere else. However I don't know how to try to trigger the same bug without the patch. The corresponding disassembled code looks like this : 1ee3: 0f 47 7d 08 cmova 0x8(%ebp),%edi 1ee7: 39 f8 cmp %edi,%eax 1ee9: 0f 4e f8 cmovle %eax,%edi 1eec: 8b 43 1c mov 0x1c(%ebx),%eax 1eef: 8b 40 60 mov 0x60(%eax),%eax 1ef2: 85 c0 test %eax,%eax 1ef4: 74 09 je 1eff 1ef6: 39 7b 64 cmp %edi,0x64(%ebx) 1ef9: 0f 8c 21 01 00 00 jl 2020 1eff: 80 7d ce 00 cmpb $0x0,-0x32(%ebp) 1f03: 0f 85 ef 00 00 00 jne 1ff8 1f09: 8b 4d dc mov -0x24(%ebp),%ecx --> 1f0c: 8b 01 mov (%ecx),%eax 1f0e: f6 c4 80 test $0x80,%ah 1f11: 0f 85 0b 03 00 00 jne 2222 1f17: 8b 45 dc mov -0x24(%ebp),%eax 1f1a: f0 ff 40 10 lock incl 0x10(%eax) 1f1e: 8b 55 d8 mov -0x28(%ebp),%edx 1f21: 8b 4d dc mov -0x24(%ebp),%ecx 1f24: 8d 04 d5 20 00 00 00 lea 0x20(,%edx,8),%eax 1f2b: 03 86 90 00 00 00 add 0x90(%esi),%eax This seems to be at the beginning of this block, at new_segment : 866 int size = min_t(size_t, psize, PAGE_SIZE - offset); 867 bool can_coalesce; 868 869 if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) { 870 new_segment: 871 if (!sk_stream_memory_free(sk)) 872 goto wait_for_sndbuf; 873 I'm trying to add some debug code. Regards, Willy