* GRO + splice panics in 3.7.0-rc5
@ 2012-11-15 22:28 Willy Tarreau
2012-11-15 23:49 ` Eric Dumazet
0 siblings, 1 reply; 12+ messages in thread
From: Willy Tarreau @ 2012-11-15 22:28 UTC (permalink / raw)
To: netdev; +Cc: eric.dumazet
Hello,
I was just about to make a quick comparison between LRO and GRO in
3.7.0-rc5 to see if LRO still had the big advantage I've always observed,
but I failed the test because as soon as I enable LRO + splice, the kernel
panics and reboots.
I could not yet manage to catch the panic output, I could just reliably
reproduce it, it crashes instantly.
All I can say at the moment is the following :
- test consist in forwarding HTTP traffic between two NICs via haproxy
- driver used was myri10ge
- LRO + recv+send : OK
- LRO + splice : OK
- GRO + recv+send : OK
- GRO + splice : panic
- no such problem was observed in 3.6.6 so I think this is a recent
regression.
I'll go back digging for more information, but as I'm used to often see
Eric suggest the right candidates for reverting, I wanted to report the
issue here in case there are easy ones to try first.
Thanks,
Willy
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: GRO + splice panics in 3.7.0-rc5 2012-11-15 22:28 GRO + splice panics in 3.7.0-rc5 Willy Tarreau @ 2012-11-15 23:49 ` Eric Dumazet 2012-11-16 6:39 ` Willy Tarreau 2012-12-01 19:43 ` Willy Tarreau 0 siblings, 2 replies; 12+ messages in thread From: Eric Dumazet @ 2012-11-15 23:49 UTC (permalink / raw) To: Willy Tarreau; +Cc: netdev On Thu, 2012-11-15 at 23:28 +0100, Willy Tarreau wrote: > Hello, > > I was just about to make a quick comparison between LRO and GRO in > 3.7.0-rc5 to see if LRO still had the big advantage I've always observed, > but I failed the test because as soon as I enable LRO + splice, the kernel > panics and reboots. > > I could not yet manage to catch the panic output, I could just reliably > reproduce it, it crashes instantly. > > All I can say at the moment is the following : > - test consist in forwarding HTTP traffic between two NICs via haproxy > - driver used was myri10ge > - LRO + recv+send : OK > - LRO + splice : OK > - GRO + recv+send : OK > - GRO + splice : panic > - no such problem was observed in 3.6.6 so I think this is a recent > regression. > > I'll go back digging for more information, but as I'm used to often see > Eric suggest the right candidates for reverting, I wanted to report the > issue here in case there are easy ones to try first. Hi Willy Nothing particular comes to mind, there were a lot of recent changes that could trigger this kind of bug. A stack trace would be useful of course ;) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GRO + splice panics in 3.7.0-rc5 2012-11-15 23:49 ` Eric Dumazet @ 2012-11-16 6:39 ` Willy Tarreau 2012-12-01 19:43 ` Willy Tarreau 1 sibling, 0 replies; 12+ messages in thread From: Willy Tarreau @ 2012-11-16 6:39 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev Hi Eric, On Thu, Nov 15, 2012 at 03:49:04PM -0800, Eric Dumazet wrote: > On Thu, 2012-11-15 at 23:28 +0100, Willy Tarreau wrote: > > Hello, > > > > I was just about to make a quick comparison between LRO and GRO in > > 3.7.0-rc5 to see if LRO still had the big advantage I've always observed, > > but I failed the test because as soon as I enable LRO + splice, the kernel > > panics and reboots. > > > > I could not yet manage to catch the panic output, I could just reliably > > reproduce it, it crashes instantly. > > > > All I can say at the moment is the following : > > - test consist in forwarding HTTP traffic between two NICs via haproxy > > - driver used was myri10ge > > - LRO + recv+send : OK > > - LRO + splice : OK > > - GRO + recv+send : OK > > - GRO + splice : panic > > - no such problem was observed in 3.6.6 so I think this is a recent > > regression. > > > > I'll go back digging for more information, but as I'm used to often see > > Eric suggest the right candidates for reverting, I wanted to report the > > issue here in case there are easy ones to try first. > > Hi Willy > > Nothing particular comes to mind, there were a lot of recent changes > that could trigger this kind of bug. OK, no problem. > A stack trace would be useful of course ;) I'll arrange to get one, then to bisect. I'll also try to reproduce with the onboard NIC (sky2) to see if this is something related to the driver itself. I can't promise anything before the end of the week-end, though. Thanks ! Willy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GRO + splice panics in 3.7.0-rc5 2012-11-15 23:49 ` Eric Dumazet 2012-11-16 6:39 ` Willy Tarreau @ 2012-12-01 19:43 ` Willy Tarreau 2012-12-01 20:52 ` Willy Tarreau 1 sibling, 1 reply; 12+ messages in thread From: Willy Tarreau @ 2012-12-01 19:43 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev Hi Eric, On Thu, Nov 15, 2012 at 03:49:04PM -0800, Eric Dumazet wrote: > On Thu, 2012-11-15 at 23:28 +0100, Willy Tarreau wrote: > > Hello, > > > > I was just about to make a quick comparison between LRO and GRO in > > 3.7.0-rc5 to see if LRO still had the big advantage I've always observed, > > but I failed the test because as soon as I enable LRO + splice, the kernel > > panics and reboots. > > > > I could not yet manage to catch the panic output, I could just reliably > > reproduce it, it crashes instantly. > > > > All I can say at the moment is the following : > > - test consist in forwarding HTTP traffic between two NICs via haproxy > > - driver used was myri10ge > > - LRO + recv+send : OK > > - LRO + splice : OK > > - GRO + recv+send : OK > > - GRO + splice : panic > > - no such problem was observed in 3.6.6 so I think this is a recent > > regression. > > > > I'll go back digging for more information, but as I'm used to often see > > Eric suggest the right candidates for reverting, I wanted to report the > > issue here in case there are easy ones to try first. > > Hi Willy > > Nothing particular comes to mind, there were a lot of recent changes > that could trigger this kind of bug. > > A stack trace would be useful of course ;) First, sorry for the long delay, I would really have liked to debug this earlier. I could finally bisect the regression to this commit : 69b08f62e17439ee3d436faf0b9a7ca6fffb78db is the first bad commit commit 69b08f62e17439ee3d436faf0b9a7ca6fffb78db Author: Eric Dumazet <edumazet@google.com> Date: Wed Sep 26 06:46:57 2012 +0000 net: use bigger pages in __netdev_alloc_frag The regression is still present in latest git (7c17e486), and goes away if I revert this patch. I get this trace both on my 64-bit 10Gig machines with a myri10ge NIC, and on my 32-bit eeepc with an atl1c NIC at 100 Mbps, so this is not related to the driver nor the architecture. The trace looks like this on my eeepc, it crashes in tcp_sendpage() (more precisely in do_tcp_sendpages() which was inlined in the former) : BUG: unable to handle kernel NULL pointer dereference at 00000b50 IP: [<c13fdf9c>] tcp_sendpage+0x28c/0x5d0 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss uvcvideo videobuf2_core videobuf2_vmalloc videobuf2_memops ath9k mac80211 snd_hda_codec_realtek microcode ath9k_common psmouse ath9k_hw serio_raw uhci_hcd ath sg cfg80211 snd_hda_intel lpc_ich ehci_hcd mfd_core snd_hda_codec atl1c snd_hwdep rtc_cmos snd_pcm snd_timer snd snd_page_alloc eeepc_laptop sparse_keymap evdev Pid: 2880, comm: haproxy Not tainted 3.7.0-rc7-eeepc #1 ASUSTeK Computer INC. 1005HA/1005HA EIP: 0060:[<c13fdf9c>] EFLAGS: 00210246 CPU: 0 EIP is at tcp_sendpage+0x28c/0x5d0 EAX: c16a4138 EBX: f5202500 ECX: 00000b50 EDX: f5a72f68 ESI: f5333cc0 EDI: 000005a8 EBP: f55b7e34 ESP: f55b7df0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: 00000b50 CR3: 35556000 CR4: 000007d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process haproxy (pid: 2880, ti=f55b6000 task=f62060c0 task.ti=f55b6000) Stack: 00000000 00000002 00000000 00000582 02000000 f52025a0 000005a8 00000002 00000b50 00001582 00000000 f6eb3900 00000b50 00000000 f5202500 c13fdd10 00000040 f55b7e5c c14201c7 000005a8 00000040 f55b7ecc f6eb3900 f5fc1680 Call Trace: [<c13fdd10>] ? sk_stream_alloc_skb+0xf0/0xf0 [<c14201c7>] inet_sendpage+0x57/0xc0 [<c1420170>] ? inet_dgram_connect+0x80/0x80 [<c13b6d99>] kernel_sendpage+0x29/0x50 [<c13b6de8>] sock_sendpage+0x28/0x30 [<c13b6dc0>] ? kernel_sendpage+0x50/0x50 [<c10e55b5>] pipe_to_sendpage+0x65/0x80 [<c10e5647>] splice_from_pipe_feed+0x77/0x150 [<c10e5550>] ? splice_from_pipe_begin+0x10/0x10 [<c10e5a84>] __splice_from_pipe+0x54/0x60 [<c10e5550>] ? splice_from_pipe_begin+0x10/0x10 [<c10e7190>] splice_from_pipe+0x50/0x70 [<c10e7200>] ? default_file_splice_write+0x50/0x50 [<c10e7222>] generic_splice_sendpage+0x22/0x30 [<c10e5550>] ? splice_from_pipe_begin+0x10/0x10 [<c10e5af6>] do_splice_from+0x66/0x90 [<c10e787f>] sys_splice+0x42f/0x480 [<c14671cc>] syscall_call+0x7/0xb [<c1460000>] ? blk_iopoll_cpu_notify+0x2c/0x5d Code: 7d 08 0f 47 7d 08 39 f8 0f 4e f8 8b 43 1c 8b 40 60 85 c0 74 09 39 7b 64 0f 8c 21 01 00 00 80 7d ce 00 0f 85 ef 00 00 00 8b 4d dc <8b> 01 f6 c4 80 0f 85 0e 03 00 00 8b 45 dc f0 ff 40 10 8b 55 d8 EIP: [<c13fdf9c>] tcp_sendpage+0x28c/0x5d0 SS:ESP 0068:f55b7df0 CR2: 0000000000000b50 ---[ end trace a07522433caf9655 ]--- In order to reproduce it, I'm just running haproxy with TCP splicing enabled. It simply forwards data from one TCP socket to another one. In case this matters, here's what the sequence looks like when it works (just relevant syscalls). fd 7 = client retrieving data, fd 8 = server providing data : socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 8 fcntl64(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0 connect(8, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("10.8.3.1")}, 16) = -1 EINPROGRESS (Operation now in progress) send(8, "GET /?s=100k HTTP/1.1\r\nConnection"..., 120, MSG_DONTWAIT|MSG_NOSIGNAL) = 120 pipe([9, 10]) = 0 splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = 5792 splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = 1448 splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = -1 EAGAIN (Resource temporarily unavailable) splice(0x9, 0, 0x7, 0, 0x1c48, 0x3) = 7240 And the splice() block above repeats over and over with varying sizes. It's worth mentionning that the issue does not happend when recv/send are used instead of splice(). Changing the pipe size using F_SETPIPE_SZ does not change the behaviour either (was using 1 MB when it first appeared and thought it was the cause). I managed to catch the oops with strace running on the process. It crashed on the first splice() call on the sending side and confirms the stack trace : pipe([9, 10]) = 0 splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = 10136 splice(0x9, 0, 0x7, 0, 0x2798, 0x3) ---> never completes, oops here. Are there any additional information I can provide to help debug this ? Seeing what the patch does, I don't think it's the direct responsible but that it triggers an existing bug somewhere else. However I don't know how to try to trigger the same bug without the patch. The corresponding disassembled code looks like this : 1ee3: 0f 47 7d 08 cmova 0x8(%ebp),%edi 1ee7: 39 f8 cmp %edi,%eax 1ee9: 0f 4e f8 cmovle %eax,%edi 1eec: 8b 43 1c mov 0x1c(%ebx),%eax 1eef: 8b 40 60 mov 0x60(%eax),%eax 1ef2: 85 c0 test %eax,%eax 1ef4: 74 09 je 1eff <tcp_sendpage+0x27f> 1ef6: 39 7b 64 cmp %edi,0x64(%ebx) 1ef9: 0f 8c 21 01 00 00 jl 2020 <tcp_sendpage+0x3a0> 1eff: 80 7d ce 00 cmpb $0x0,-0x32(%ebp) 1f03: 0f 85 ef 00 00 00 jne 1ff8 <tcp_sendpage+0x378> 1f09: 8b 4d dc mov -0x24(%ebp),%ecx --> 1f0c: 8b 01 mov (%ecx),%eax 1f0e: f6 c4 80 test $0x80,%ah 1f11: 0f 85 0b 03 00 00 jne 2222 <tcp_sendpage+0x5a2> 1f17: 8b 45 dc mov -0x24(%ebp),%eax 1f1a: f0 ff 40 10 lock incl 0x10(%eax) 1f1e: 8b 55 d8 mov -0x28(%ebp),%edx 1f21: 8b 4d dc mov -0x24(%ebp),%ecx 1f24: 8d 04 d5 20 00 00 00 lea 0x20(,%edx,8),%eax 1f2b: 03 86 90 00 00 00 add 0x90(%esi),%eax This seems to be at the beginning of this block, at new_segment : 866 int size = min_t(size_t, psize, PAGE_SIZE - offset); 867 bool can_coalesce; 868 869 if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) { 870 new_segment: 871 if (!sk_stream_memory_free(sk)) 872 goto wait_for_sndbuf; 873 I'm trying to add some debug code. Regards, Willy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GRO + splice panics in 3.7.0-rc5 2012-12-01 19:43 ` Willy Tarreau @ 2012-12-01 20:52 ` Willy Tarreau 2012-12-01 21:47 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Willy Tarreau @ 2012-12-01 20:52 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 2878 bytes --] OK now I have a simple and reliable reproducer for the bug. It just creates 2 TCP sockets connected to each other and splices from one to the other. It looks 100% reliable, I'm attaching it hoping it can help, it's much easier and more reliable than starting haproxy with clients and servers! I'm noting that it only crashes when I send more than one page of data. 4096 bytes and below are OK, 4097 and above do crash. I have added a few printk() and it dies in get_page() called from do_tcp_sendpages() : 888 i = skb_shinfo(skb)->nr_frags; 889 printk(KERN_DEBUG "%d: page=%p i=%d skb=%p offset=%d\n", __LINE__, page, i, skb, offset); 890 can_coalesce = skb_can_coalesce(skb, i, page, offset); 891 printk(KERN_DEBUG "%d: can_coalesce=%d\n", __LINE__, can_coalesce); 892 if (!can_coalesce && i >= MAX_SKB_FRAGS) { 893 printk(KERN_DEBUG "%d\n", __LINE__); 894 tcp_mark_push(tp, skb); 895 goto new_segment; 896 } 897 if (!sk_wmem_schedule(sk, copy)) { 898 printk(KERN_DEBUG "%d\n", __LINE__); 899 goto wait_for_memory; 900 } 901 902 if (can_coalesce) { 903 printk(KERN_DEBUG "%d\n", __LINE__); 904 skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); 905 } else { 906 printk(KERN_DEBUG "%d\n", __LINE__); 907 get_page(page); 908 printk(KERN_DEBUG "%d\n", __LINE__); 909 skb_fill_page_desc(skb, i, page, offset, copy); 910 } 911 912 printk(KERN_DEBUG "%d\n", __LINE__); 913 skb->len += copy; 914 skb->data_len += copy; dmesg starts with : 889: page=f77b9ca0 i=0 skb=f5b91540 offset=0 891: can_coalesce=0 906 908 912 871 875 880 889: page=f6e7a300 i=0 skb=f46f5c80 offset=2513 891: can_coalesce=0 906 908 912 889: page=0000fb80 i=1 skb=f46f5c80 offset=0 891: can_coalesce=0 906 BUG: unable to handle kernel paging request at 0000fb80 IP: [<c13b4878>] tcp_sendpage+0x568/0x6d0 If that can help, I also noticed this one once with a page that could look valid, so the error was propagated further (the send() was 64kB) : ... 906 908 912 889: page=f6e79f80 i=6 skb=f37a1c80 offset=0 891: can_coalesce=0 906 908 912 889: page=c13d58a7 i=7 skb=f37a1c80 offset=927 891: can_coalesce=0 906 908 912 889: page=c13b4310 i=8 skb=f37a1c80 offset=3877 891: can_coalesce=0 906 BUG: unable to handle kernel paging request at 8b31752a IP: [<c1094371>] __get_page_tail+0x31/0x90 *pde = 00000000 Now I'm really at loss, so do not hesitate to ask me for more info, because I don't know where to look. Cheers, Willy [-- Attachment #2: splice-bug.c --] [-- Type: text/plain, Size: 907 bytes --] #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/tcp.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> void fail(const char *e) { perror(e); exit(1); } int main(int argc, char **argv) { struct sockaddr sa; socklen_t sl; int p[2]; int l, s, r; char buffer[4097]; l = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); if (l < 0) fail("L: socket()"); if (listen(l, 1) < 0) fail("L: listen()"); sl = sizeof(sa); if (getsockname(l, &sa, &sl) < 0) fail("L: getsockname()"); if ((s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) fail("S: socket()"); if (connect(s, &sa, sl) < 0) fail("S: connect()"); sl = sizeof(sa); if ((r = accept(l, &sa, &sl)) < 0) fail("R: accept()"); pipe(p); send(s, buffer, sizeof(buffer), MSG_DONTWAIT); l = splice(r, NULL, p[1], NULL, 1048576, 3); splice(p[0], NULL, s, NULL, l, 3); exit(0); } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GRO + splice panics in 3.7.0-rc5 2012-12-01 20:52 ` Willy Tarreau @ 2012-12-01 21:47 ` Eric Dumazet 2012-12-01 22:32 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2012-12-01 21:47 UTC (permalink / raw) To: Willy Tarreau; +Cc: netdev On Sat, 2012-12-01 at 21:52 +0100, Willy Tarreau wrote: > OK now I have a simple and reliable reproducer for the bug. It > just creates 2 TCP sockets connected to each other and splices > from one to the other. It looks 100% reliable, I'm attaching it > hoping it can help, it's much easier and more reliable than > starting haproxy with clients and servers! > > I'm noting that it only crashes when I send more than one page > of data. 4096 bytes and below are OK, 4097 and above do crash. > > I have added a few printk() and it dies in get_page() called from > do_tcp_sendpages() : > > 888 i = skb_shinfo(skb)->nr_frags; > 889 printk(KERN_DEBUG "%d: page=%p i=%d skb=%p offset=%d\n", __LINE__, page, i, skb, offset); > 890 can_coalesce = skb_can_coalesce(skb, i, page, offset); > 891 printk(KERN_DEBUG "%d: can_coalesce=%d\n", __LINE__, can_coalesce); > 892 if (!can_coalesce && i >= MAX_SKB_FRAGS) { > 893 printk(KERN_DEBUG "%d\n", __LINE__); > 894 tcp_mark_push(tp, skb); > 895 goto new_segment; > 896 } > 897 if (!sk_wmem_schedule(sk, copy)) { > 898 printk(KERN_DEBUG "%d\n", __LINE__); > 899 goto wait_for_memory; > 900 } > 901 > 902 if (can_coalesce) { > 903 printk(KERN_DEBUG "%d\n", __LINE__); > 904 skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); > 905 } else { > 906 printk(KERN_DEBUG "%d\n", __LINE__); > 907 get_page(page); > 908 printk(KERN_DEBUG "%d\n", __LINE__); > 909 skb_fill_page_desc(skb, i, page, offset, copy); > 910 } > 911 > 912 printk(KERN_DEBUG "%d\n", __LINE__); > 913 skb->len += copy; > 914 skb->data_len += copy; > > dmesg starts with : > > 889: page=f77b9ca0 i=0 skb=f5b91540 offset=0 > 891: can_coalesce=0 > 906 > 908 > 912 > 871 > 875 > 880 > 889: page=f6e7a300 i=0 skb=f46f5c80 offset=2513 > 891: can_coalesce=0 > 906 > 908 > 912 > 889: page=0000fb80 i=1 skb=f46f5c80 offset=0 > 891: can_coalesce=0 > 906 > BUG: unable to handle kernel paging request at 0000fb80 > IP: [<c13b4878>] tcp_sendpage+0x568/0x6d0 > > If that can help, I also noticed this one once with a > page that could look valid, so the error was propagated > further (the send() was 64kB) : > > ... > 906 > 908 > 912 > 889: page=f6e79f80 i=6 skb=f37a1c80 offset=0 > 891: can_coalesce=0 > 906 > 908 > 912 > 889: page=c13d58a7 i=7 skb=f37a1c80 offset=927 > 891: can_coalesce=0 > 906 > 908 > 912 > 889: page=c13b4310 i=8 skb=f37a1c80 offset=3877 > 891: can_coalesce=0 > 906 > BUG: unable to handle kernel paging request at 8b31752a > IP: [<c1094371>] __get_page_tail+0x31/0x90 > *pde = 00000000 > > Now I'm really at loss, so do not hesitate to ask me for more > info, because I don't know where to look. Thanks a lot Willy I believe do_tcp_sendpages() needs a fix, I'll send a patch asap ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GRO + splice panics in 3.7.0-rc5 2012-12-01 21:47 ` Eric Dumazet @ 2012-12-01 22:32 ` Eric Dumazet 2012-12-01 22:40 ` Willy Tarreau 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2012-12-01 22:32 UTC (permalink / raw) To: Willy Tarreau; +Cc: netdev On Sat, 2012-12-01 at 13:47 -0800, Eric Dumazet wrote: > Thanks a lot Willy > > I believe do_tcp_sendpages() needs a fix, I'll send a patch asap > Could you try the following patch ? do_tcp_sendpages() looks really wrong, as only one page is provided by the caller. Thanks ! diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e6eace1..6976dba 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -831,8 +831,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags) return mss_now; } -static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset, - size_t psize, int flags) +static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, + size_t size, int flags) { struct tcp_sock *tp = tcp_sk(sk); int mss_now, size_goal; @@ -859,12 +859,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) goto out_err; - while (psize > 0) { + while (size > 0) { struct sk_buff *skb = tcp_write_queue_tail(sk); - struct page *page = pages[poffset / PAGE_SIZE]; int copy, i; - int offset = poffset % PAGE_SIZE; - int size = min_t(size_t, psize, PAGE_SIZE - offset); bool can_coalesce; if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) { @@ -913,8 +910,8 @@ new_segment: TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH; copied += copy; - poffset += copy; - if (!(psize -= copy)) + offset += copy; + if (!(size -= copy)) goto out; if (skb->len < size_goal || (flags & MSG_OOB)) @@ -961,7 +958,7 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset, flags); lock_sock(sk); - res = do_tcp_sendpages(sk, &page, offset, size, flags); + res = do_tcp_sendpages(sk, page, offset, size, flags); release_sock(sk); return res; } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: GRO + splice panics in 3.7.0-rc5 2012-12-01 22:32 ` Eric Dumazet @ 2012-12-01 22:40 ` Willy Tarreau 2012-12-01 23:07 ` [PATCH net] tcp: fix crashes in do_tcp_sendpages() Eric Dumazet 2012-12-02 2:58 ` GRO + splice panics in 3.7.0-rc5 Eric Dumazet 0 siblings, 2 replies; 12+ messages in thread From: Willy Tarreau @ 2012-12-01 22:40 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev On Sat, Dec 01, 2012 at 02:32:01PM -0800, Eric Dumazet wrote: > On Sat, 2012-12-01 at 13:47 -0800, Eric Dumazet wrote: > > > Thanks a lot Willy > > > > I believe do_tcp_sendpages() needs a fix, I'll send a patch asap > > > > Could you try the following patch ? > > do_tcp_sendpages() looks really wrong, as only one page is provided by > the caller. Excellent Eric, it's rock solid now both with the reproducer and with haproxy! Feel free to add my Tested-By if you want. I really think we should feed this to Linus quickly before he releases 3.7, as I'm realizing that it would really not be nice to have a user-triggerable crash in a release :-/ Thanks ! Willy ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net] tcp: fix crashes in do_tcp_sendpages() 2012-12-01 22:40 ` Willy Tarreau @ 2012-12-01 23:07 ` Eric Dumazet 2012-12-02 1:41 ` David Miller 2012-12-02 1:43 ` Joe Perches 2012-12-02 2:58 ` GRO + splice panics in 3.7.0-rc5 Eric Dumazet 1 sibling, 2 replies; 12+ messages in thread From: Eric Dumazet @ 2012-12-01 23:07 UTC (permalink / raw) To: Willy Tarreau, David Miller; +Cc: netdev From: Eric Dumazet <edumazet@google.com> Recent network changes allowed high order pages being used for skb fragments. This uncovered a bug in do_tcp_sendpages() which was assuming its caller provided an array of order-0 page pointers. We only have to deal with a single page in this function, and its order is irrelevant. Reported-by: Willy Tarreau <w@1wt.eu> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/tcp.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 083092e..e457c7a 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -830,8 +830,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags) return mss_now; } -static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset, - size_t psize, int flags) +static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, + size_t size, int flags) { struct tcp_sock *tp = tcp_sk(sk); int mss_now, size_goal; @@ -858,12 +858,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) goto out_err; - while (psize > 0) { + while (size > 0) { struct sk_buff *skb = tcp_write_queue_tail(sk); - struct page *page = pages[poffset / PAGE_SIZE]; int copy, i; - int offset = poffset % PAGE_SIZE; - int size = min_t(size_t, psize, PAGE_SIZE - offset); bool can_coalesce; if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) { @@ -912,8 +909,8 @@ new_segment: TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH; copied += copy; - poffset += copy; - if (!(psize -= copy)) + offset += copy; + if (!(size -= copy)) goto out; if (skb->len < size_goal || (flags & MSG_OOB)) @@ -960,7 +957,7 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset, flags); lock_sock(sk); - res = do_tcp_sendpages(sk, &page, offset, size, flags); + res = do_tcp_sendpages(sk, page, offset, size, flags); release_sock(sk); return res; } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net] tcp: fix crashes in do_tcp_sendpages() 2012-12-01 23:07 ` [PATCH net] tcp: fix crashes in do_tcp_sendpages() Eric Dumazet @ 2012-12-02 1:41 ` David Miller 2012-12-02 1:43 ` Joe Perches 1 sibling, 0 replies; 12+ messages in thread From: David Miller @ 2012-12-02 1:41 UTC (permalink / raw) To: eric.dumazet; +Cc: w, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sat, 01 Dec 2012 15:07:02 -0800 > From: Eric Dumazet <edumazet@google.com> > > Recent network changes allowed high order pages being used > for skb fragments. > > This uncovered a bug in do_tcp_sendpages() which was assuming its caller > provided an array of order-0 page pointers. > > We only have to deal with a single page in this function, and its order > is irrelevant. > > Reported-by: Willy Tarreau <w@1wt.eu> > Tested-by: Willy Tarreau <w@1wt.eu> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied, thanks Eric. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net] tcp: fix crashes in do_tcp_sendpages() 2012-12-01 23:07 ` [PATCH net] tcp: fix crashes in do_tcp_sendpages() Eric Dumazet 2012-12-02 1:41 ` David Miller @ 2012-12-02 1:43 ` Joe Perches 1 sibling, 0 replies; 12+ messages in thread From: Joe Perches @ 2012-12-02 1:43 UTC (permalink / raw) To: Eric Dumazet; +Cc: Willy Tarreau, David Miller, netdev On Sat, 2012-12-01 at 15:07 -0800, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > Recent network changes allowed high order pages being used > for skb fragments. > > This uncovered a bug in do_tcp_sendpages() which was assuming its caller > provided an array of order-0 page pointers. > > We only have to deal with a single page in this function, and its order > is irrelevant. [] > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c [] > @@ -830,8 +830,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags) > return mss_now; > } > > -static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset, > - size_t psize, int flags) > +static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, > + size_t size, int flags) Shouldn't this now be named do_tcp_sendpage ? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GRO + splice panics in 3.7.0-rc5 2012-12-01 22:40 ` Willy Tarreau 2012-12-01 23:07 ` [PATCH net] tcp: fix crashes in do_tcp_sendpages() Eric Dumazet @ 2012-12-02 2:58 ` Eric Dumazet 1 sibling, 0 replies; 12+ messages in thread From: Eric Dumazet @ 2012-12-02 2:58 UTC (permalink / raw) To: Willy Tarreau; +Cc: netdev On Sat, 2012-12-01 at 23:40 +0100, Willy Tarreau wrote: > Excellent Eric, it's rock solid now both with the reproducer and with > haproxy! Feel free to add my Tested-By if you want. You did a very good work, thanks again Willy. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-12-02 2:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-15 22:28 GRO + splice panics in 3.7.0-rc5 Willy Tarreau 2012-11-15 23:49 ` Eric Dumazet 2012-11-16 6:39 ` Willy Tarreau 2012-12-01 19:43 ` Willy Tarreau 2012-12-01 20:52 ` Willy Tarreau 2012-12-01 21:47 ` Eric Dumazet 2012-12-01 22:32 ` Eric Dumazet 2012-12-01 22:40 ` Willy Tarreau 2012-12-01 23:07 ` [PATCH net] tcp: fix crashes in do_tcp_sendpages() Eric Dumazet 2012-12-02 1:41 ` David Miller 2012-12-02 1:43 ` Joe Perches 2012-12-02 2:58 ` GRO + splice panics in 3.7.0-rc5 Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).