GRO + splice panics in 3.7.0-rc5

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* GRO + splice panics in 3.7.0-rc5
@ 2012-11-15 22:28 Willy Tarreau
  2012-11-15 23:49 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Willy Tarreau @ 2012-11-15 22:28 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet

Hello,

I was just about to make a quick comparison between LRO and GRO in
3.7.0-rc5 to see if LRO still had the big advantage I've always observed,
but I failed the test because as soon as I enable LRO + splice, the kernel
panics and reboots.

I could not yet manage to catch the panic output, I could just reliably
reproduce it, it crashes instantly.

All I can say at the moment is the following :
  - test consist in forwarding HTTP traffic between two NICs via haproxy
  - driver used was myri10ge
  - LRO + recv+send : OK
  - LRO + splice : OK
  - GRO + recv+send : OK
  - GRO + splice : panic
  - no such problem was observed in 3.6.6 so I think this is a recent
    regression.

I'll go back digging for more information, but as I'm used to often see
Eric suggest the right candidates for reverting, I wanted to report the
issue here in case there are easy ones to try first.

Thanks,
Willy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GRO + splice panics in 3.7.0-rc5
  2012-11-15 22:28 GRO + splice panics in 3.7.0-rc5 Willy Tarreau
@ 2012-11-15 23:49 ` Eric Dumazet
  2012-11-16  6:39   ` Willy Tarreau
  2012-12-01 19:43   ` Willy Tarreau
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2012-11-15 23:49 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev

On Thu, 2012-11-15 at 23:28 +0100, Willy Tarreau wrote:
> Hello,
> 
> I was just about to make a quick comparison between LRO and GRO in
> 3.7.0-rc5 to see if LRO still had the big advantage I've always observed,
> but I failed the test because as soon as I enable LRO + splice, the kernel
> panics and reboots.
> 
> I could not yet manage to catch the panic output, I could just reliably
> reproduce it, it crashes instantly.
> 
> All I can say at the moment is the following :
>   - test consist in forwarding HTTP traffic between two NICs via haproxy
>   - driver used was myri10ge
>   - LRO + recv+send : OK
>   - LRO + splice : OK
>   - GRO + recv+send : OK
>   - GRO + splice : panic
>   - no such problem was observed in 3.6.6 so I think this is a recent
>     regression.
> 
> I'll go back digging for more information, but as I'm used to often see
> Eric suggest the right candidates for reverting, I wanted to report the
> issue here in case there are easy ones to try first.

Hi Willy

Nothing particular comes to mind, there were a lot of recent changes
that could trigger this kind of bug.

A stack trace would be useful of course ;)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GRO + splice panics in 3.7.0-rc5
  2012-11-15 23:49 ` Eric Dumazet
@ 2012-11-16  6:39   ` Willy Tarreau
  2012-12-01 19:43   ` Willy Tarreau
  1 sibling, 0 replies; 12+ messages in thread
From: Willy Tarreau @ 2012-11-16  6:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Hi Eric,

On Thu, Nov 15, 2012 at 03:49:04PM -0800, Eric Dumazet wrote:
> On Thu, 2012-11-15 at 23:28 +0100, Willy Tarreau wrote:
> > Hello,
> > 
> > I was just about to make a quick comparison between LRO and GRO in
> > 3.7.0-rc5 to see if LRO still had the big advantage I've always observed,
> > but I failed the test because as soon as I enable LRO + splice, the kernel
> > panics and reboots.
> > 
> > I could not yet manage to catch the panic output, I could just reliably
> > reproduce it, it crashes instantly.
> > 
> > All I can say at the moment is the following :
> >   - test consist in forwarding HTTP traffic between two NICs via haproxy
> >   - driver used was myri10ge
> >   - LRO + recv+send : OK
> >   - LRO + splice : OK
> >   - GRO + recv+send : OK
> >   - GRO + splice : panic
> >   - no such problem was observed in 3.6.6 so I think this is a recent
> >     regression.
> > 
> > I'll go back digging for more information, but as I'm used to often see
> > Eric suggest the right candidates for reverting, I wanted to report the
> > issue here in case there are easy ones to try first.
> 
> Hi Willy
> 
> Nothing particular comes to mind, there were a lot of recent changes
> that could trigger this kind of bug.

OK, no problem.

> A stack trace would be useful of course ;)

I'll arrange to get one, then to bisect. I'll also try to reproduce with
the onboard NIC (sky2) to see if this is something related to the driver
itself. I can't promise anything before the end of the week-end, though.

Thanks !
Willy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GRO + splice panics in 3.7.0-rc5
  2012-11-15 23:49 ` Eric Dumazet
  2012-11-16  6:39   ` Willy Tarreau
@ 2012-12-01 19:43   ` Willy Tarreau
  2012-12-01 20:52     ` Willy Tarreau
  1 sibling, 1 reply; 12+ messages in thread
From: Willy Tarreau @ 2012-12-01 19:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Hi Eric,

On Thu, Nov 15, 2012 at 03:49:04PM -0800, Eric Dumazet wrote:
> On Thu, 2012-11-15 at 23:28 +0100, Willy Tarreau wrote:
> > Hello,
> > 
> > I was just about to make a quick comparison between LRO and GRO in
> > 3.7.0-rc5 to see if LRO still had the big advantage I've always observed,
> > but I failed the test because as soon as I enable LRO + splice, the kernel
> > panics and reboots.
> > 
> > I could not yet manage to catch the panic output, I could just reliably
> > reproduce it, it crashes instantly.
> > 
> > All I can say at the moment is the following :
> >   - test consist in forwarding HTTP traffic between two NICs via haproxy
> >   - driver used was myri10ge
> >   - LRO + recv+send : OK
> >   - LRO + splice : OK
> >   - GRO + recv+send : OK
> >   - GRO + splice : panic
> >   - no such problem was observed in 3.6.6 so I think this is a recent
> >     regression.
> > 
> > I'll go back digging for more information, but as I'm used to often see
> > Eric suggest the right candidates for reverting, I wanted to report the
> > issue here in case there are easy ones to try first.
> 
> Hi Willy
> 
> Nothing particular comes to mind, there were a lot of recent changes
> that could trigger this kind of bug.
> 
> A stack trace would be useful of course ;)

First, sorry for the long delay, I would really have liked to debug
this earlier.

I could finally bisect the regression to this commit :

  69b08f62e17439ee3d436faf0b9a7ca6fffb78db is the first bad commit
  commit 69b08f62e17439ee3d436faf0b9a7ca6fffb78db
  Author: Eric Dumazet <edumazet@google.com>
  Date:   Wed Sep 26 06:46:57 2012 +0000

    net: use bigger pages in __netdev_alloc_frag

The regression is still present in latest git (7c17e486), and goes away
if I revert this patch.

I get this trace both on my 64-bit 10Gig machines with a myri10ge NIC,
and on my 32-bit eeepc with an atl1c NIC at 100 Mbps, so this is not
related to the driver nor the architecture.

The trace looks like this on my eeepc, it crashes in tcp_sendpage()
(more precisely in do_tcp_sendpages() which was inlined in the former) :

BUG: unable to handle kernel NULL pointer dereference at 00000b50
IP: [<c13fdf9c>] tcp_sendpage+0x28c/0x5d0
*pde = 00000000 
Oops: 0000 [#1] SMP 
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss uvcvideo videobuf2_core videobuf2_vmalloc videobuf2_memops ath9k mac80211 snd_hda_codec_realtek microcode ath9k_common psmouse ath9k_hw serio_raw uhci_hcd ath sg cfg80211 snd_hda_intel lpc_ich ehci_hcd mfd_core snd_hda_codec atl1c snd_hwdep rtc_cmos snd_pcm snd_timer snd snd_page_alloc eeepc_laptop sparse_keymap evdev
Pid: 2880, comm: haproxy Not tainted 3.7.0-rc7-eeepc #1 ASUSTeK Computer INC. 1005HA/1005HA
EIP: 0060:[<c13fdf9c>] EFLAGS: 00210246 CPU: 0
EIP is at tcp_sendpage+0x28c/0x5d0
EAX: c16a4138 EBX: f5202500 ECX: 00000b50 EDX: f5a72f68
ESI: f5333cc0 EDI: 000005a8 EBP: f55b7e34 ESP: f55b7df0
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: 00000b50 CR3: 35556000 CR4: 000007d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process haproxy (pid: 2880, ti=f55b6000 task=f62060c0 task.ti=f55b6000)
Stack:
 00000000 00000002 00000000 00000582 02000000 f52025a0 000005a8 00000002
 00000b50 00001582 00000000 f6eb3900 00000b50 00000000 f5202500 c13fdd10
 00000040 f55b7e5c c14201c7 000005a8 00000040 f55b7ecc f6eb3900 f5fc1680
Call Trace:
 [<c13fdd10>] ? sk_stream_alloc_skb+0xf0/0xf0
 [<c14201c7>] inet_sendpage+0x57/0xc0
 [<c1420170>] ? inet_dgram_connect+0x80/0x80
 [<c13b6d99>] kernel_sendpage+0x29/0x50
 [<c13b6de8>] sock_sendpage+0x28/0x30
 [<c13b6dc0>] ? kernel_sendpage+0x50/0x50
 [<c10e55b5>] pipe_to_sendpage+0x65/0x80
 [<c10e5647>] splice_from_pipe_feed+0x77/0x150
 [<c10e5550>] ? splice_from_pipe_begin+0x10/0x10
 [<c10e5a84>] __splice_from_pipe+0x54/0x60
 [<c10e5550>] ? splice_from_pipe_begin+0x10/0x10
 [<c10e7190>] splice_from_pipe+0x50/0x70
 [<c10e7200>] ? default_file_splice_write+0x50/0x50
 [<c10e7222>] generic_splice_sendpage+0x22/0x30
 [<c10e5550>] ? splice_from_pipe_begin+0x10/0x10
 [<c10e5af6>] do_splice_from+0x66/0x90
 [<c10e787f>] sys_splice+0x42f/0x480
 [<c14671cc>] syscall_call+0x7/0xb
 [<c1460000>] ? blk_iopoll_cpu_notify+0x2c/0x5d
Code: 7d 08 0f 47 7d 08 39 f8 0f 4e f8 8b 43 1c 8b 40 60 85 c0 74 09 39 7b 64 0f 8c 21 01 00 00 80 7d ce 00 0f 85 ef 00 00 00 8b 4d dc <8b> 01 f6 c4 80 0f 85 0e 03 00 00 8b 45 dc f0 ff 40 10 8b 55 d8
EIP: [<c13fdf9c>] tcp_sendpage+0x28c/0x5d0 SS:ESP 0068:f55b7df0
CR2: 0000000000000b50
---[ end trace a07522433caf9655 ]---

In order to reproduce it, I'm just running haproxy with TCP splicing
enabled. It simply forwards data from one TCP socket to another one.

In case this matters, here's what the sequence looks like when it works
(just relevant syscalls). fd 7 = client retrieving data, fd 8 = server
providing data :

  socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 8
  fcntl64(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
  setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
  connect(8, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("10.8.3.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
  send(8, "GET /?s=100k HTTP/1.1\r\nConnection"..., 120, MSG_DONTWAIT|MSG_NOSIGNAL) = 120
  pipe([9, 10])                           = 0

  splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = 5792
  splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = 1448
  splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = -1 EAGAIN (Resource temporarily unavailable)
  splice(0x9, 0, 0x7, 0, 0x1c48, 0x3)     = 7240

And the splice() block above repeats over and over with varying sizes.

It's worth mentionning that the issue does not happend when recv/send are
used instead of splice(). Changing the pipe size using F_SETPIPE_SZ does
not change the behaviour either (was using 1 MB when it first appeared and
thought it was the cause).

I managed to catch the oops with strace running on the process. It crashed
on the first splice() call on the sending side and confirms the stack trace :

  pipe([9, 10])                           = 0
  splice(0x8, 0, 0xa, 0, 0x40000000, 0x3) = 10136
  splice(0x9, 0, 0x7, 0, 0x2798, 0x3)     ---> never completes, oops here.

Are there any additional information I can provide to help debug this ?

Seeing what the patch does, I don't think it's the direct responsible
but that it triggers an existing bug somewhere else. However I don't
know how to try to trigger the same bug without the patch.

The corresponding disassembled code looks like this :

    1ee3:       0f 47 7d 08             cmova  0x8(%ebp),%edi
    1ee7:       39 f8                   cmp    %edi,%eax
    1ee9:       0f 4e f8                cmovle %eax,%edi
    1eec:       8b 43 1c                mov    0x1c(%ebx),%eax
    1eef:       8b 40 60                mov    0x60(%eax),%eax
    1ef2:       85 c0                   test   %eax,%eax
    1ef4:       74 09                   je     1eff <tcp_sendpage+0x27f>
    1ef6:       39 7b 64                cmp    %edi,0x64(%ebx)
    1ef9:       0f 8c 21 01 00 00       jl     2020 <tcp_sendpage+0x3a0>
    1eff:       80 7d ce 00             cmpb   $0x0,-0x32(%ebp)
    1f03:       0f 85 ef 00 00 00       jne    1ff8 <tcp_sendpage+0x378>
    1f09:       8b 4d dc                mov    -0x24(%ebp),%ecx
--> 1f0c:       8b 01                   mov    (%ecx),%eax
    1f0e:       f6 c4 80                test   $0x80,%ah
    1f11:       0f 85 0b 03 00 00       jne    2222 <tcp_sendpage+0x5a2>
    1f17:       8b 45 dc                mov    -0x24(%ebp),%eax
    1f1a:       f0 ff 40 10             lock incl 0x10(%eax)
    1f1e:       8b 55 d8                mov    -0x28(%ebp),%edx
    1f21:       8b 4d dc                mov    -0x24(%ebp),%ecx
    1f24:       8d 04 d5 20 00 00 00    lea    0x20(,%edx,8),%eax
    1f2b:       03 86 90 00 00 00       add    0x90(%esi),%eax

This seems to be at the beginning of this block, at new_segment :

   866                  int size = min_t(size_t, psize, PAGE_SIZE - offset);
   867                  bool can_coalesce;
   868
   869                  if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
   870  new_segment:
   871                          if (!sk_stream_memory_free(sk))
   872                                  goto wait_for_sndbuf;
   873

I'm trying to add some debug code.

Regards,
Willy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GRO + splice panics in 3.7.0-rc5
  2012-12-01 19:43   ` Willy Tarreau
@ 2012-12-01 20:52     ` Willy Tarreau
  2012-12-01 21:47       ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Willy Tarreau @ 2012-12-01 20:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 2878 bytes --]

OK now I have a simple and reliable reproducer for the bug. It
just creates 2 TCP sockets connected to each other and splices
from one to the other. It looks 100% reliable, I'm attaching it
hoping it can help, it's much easier and more reliable than
starting haproxy with clients and servers!

I'm noting that it only crashes when I send more than one page
of data. 4096 bytes and below are OK, 4097 and above do crash.

I have added a few printk() and it dies in get_page() called from
do_tcp_sendpages() :

  888                  i = skb_shinfo(skb)->nr_frags;
  889  printk(KERN_DEBUG "%d: page=%p i=%d skb=%p offset=%d\n", __LINE__, page, i, skb, offset);
  890                  can_coalesce = skb_can_coalesce(skb, i, page, offset);
  891  printk(KERN_DEBUG "%d: can_coalesce=%d\n", __LINE__, can_coalesce);
  892                  if (!can_coalesce && i >= MAX_SKB_FRAGS) {
  893  printk(KERN_DEBUG "%d\n", __LINE__);
  894                          tcp_mark_push(tp, skb);
  895                          goto new_segment;
  896                  }
  897                  if (!sk_wmem_schedule(sk, copy)) {
  898  printk(KERN_DEBUG "%d\n", __LINE__);
  899                          goto wait_for_memory;
  900                  }
  901
  902                  if (can_coalesce) {
  903  printk(KERN_DEBUG "%d\n", __LINE__);
  904                          skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
  905                  } else {
  906  printk(KERN_DEBUG "%d\n", __LINE__);
  907                          get_page(page);
  908  printk(KERN_DEBUG "%d\n", __LINE__);
  909                          skb_fill_page_desc(skb, i, page, offset, copy);
  910                  }
  911
  912  printk(KERN_DEBUG "%d\n", __LINE__);
  913                  skb->len += copy;
  914                  skb->data_len += copy;

dmesg starts with :

  889: page=f77b9ca0 i=0 skb=f5b91540 offset=0
  891: can_coalesce=0
  906
  908
  912
  871
  875
  880
  889: page=f6e7a300 i=0 skb=f46f5c80 offset=2513
  891: can_coalesce=0
  906
  908
  912
  889: page=0000fb80 i=1 skb=f46f5c80 offset=0
  891: can_coalesce=0
  906
  BUG: unable to handle kernel paging request at 0000fb80
IP: [<c13b4878>] tcp_sendpage+0x568/0x6d0

If that can help, I also noticed this one once with a
page that could look valid, so the error was propagated
further (the send() was 64kB) :

  ...
  906
  908
  912
  889: page=f6e79f80 i=6 skb=f37a1c80 offset=0
  891: can_coalesce=0
  906
  908
  912
  889: page=c13d58a7 i=7 skb=f37a1c80 offset=927
  891: can_coalesce=0
  906
  908
  912
  889: page=c13b4310 i=8 skb=f37a1c80 offset=3877
  891: can_coalesce=0
  906
  BUG: unable to handle kernel paging request at 8b31752a
  IP: [<c1094371>] __get_page_tail+0x31/0x90
  *pde = 00000000 

Now I'm really at loss, so do not hesitate to ask me for more
info, because I don't know where to look.

Cheers,
Willy


[-- Attachment #2: splice-bug.c --]
[-- Type: text/plain, Size: 907 bytes --]

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void fail(const char *e)
{
	perror(e);
	exit(1);
}

int main(int argc, char **argv)
{
	struct sockaddr sa;
	socklen_t sl;
	int p[2];
	int l, s, r;
	char buffer[4097];

	l = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (l < 0)
		fail("L: socket()");

	if (listen(l, 1) < 0)
		fail("L: listen()");

	sl = sizeof(sa);
	if (getsockname(l, &sa, &sl) < 0)
		fail("L: getsockname()");

	if ((s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0)
		fail("S: socket()");

	if (connect(s, &sa, sl) < 0)
		fail("S: connect()");

	sl = sizeof(sa);
	if ((r = accept(l, &sa, &sl)) < 0)
		fail("R: accept()");

	pipe(p);

	send(s, buffer, sizeof(buffer), MSG_DONTWAIT);

	l = splice(r, NULL, p[1], NULL, 1048576, 3);
	splice(p[0], NULL, s, NULL, l, 3);

	exit(0);
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GRO + splice panics in 3.7.0-rc5
  2012-12-01 20:52     ` Willy Tarreau
@ 2012-12-01 21:47       ` Eric Dumazet
  2012-12-01 22:32         ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2012-12-01 21:47 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev

On Sat, 2012-12-01 at 21:52 +0100, Willy Tarreau wrote:
> OK now I have a simple and reliable reproducer for the bug. It
> just creates 2 TCP sockets connected to each other and splices
> from one to the other. It looks 100% reliable, I'm attaching it
> hoping it can help, it's much easier and more reliable than
> starting haproxy with clients and servers!
> 
> I'm noting that it only crashes when I send more than one page
> of data. 4096 bytes and below are OK, 4097 and above do crash.
> 
> I have added a few printk() and it dies in get_page() called from
> do_tcp_sendpages() :
> 
>   888                  i = skb_shinfo(skb)->nr_frags;
>   889  printk(KERN_DEBUG "%d: page=%p i=%d skb=%p offset=%d\n", __LINE__, page, i, skb, offset);
>   890                  can_coalesce = skb_can_coalesce(skb, i, page, offset);
>   891  printk(KERN_DEBUG "%d: can_coalesce=%d\n", __LINE__, can_coalesce);
>   892                  if (!can_coalesce && i >= MAX_SKB_FRAGS) {
>   893  printk(KERN_DEBUG "%d\n", __LINE__);
>   894                          tcp_mark_push(tp, skb);
>   895                          goto new_segment;
>   896                  }
>   897                  if (!sk_wmem_schedule(sk, copy)) {
>   898  printk(KERN_DEBUG "%d\n", __LINE__);
>   899                          goto wait_for_memory;
>   900                  }
>   901
>   902                  if (can_coalesce) {
>   903  printk(KERN_DEBUG "%d\n", __LINE__);
>   904                          skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
>   905                  } else {
>   906  printk(KERN_DEBUG "%d\n", __LINE__);
>   907                          get_page(page);
>   908  printk(KERN_DEBUG "%d\n", __LINE__);
>   909                          skb_fill_page_desc(skb, i, page, offset, copy);
>   910                  }
>   911
>   912  printk(KERN_DEBUG "%d\n", __LINE__);
>   913                  skb->len += copy;
>   914                  skb->data_len += copy;
> 
> dmesg starts with :
> 
>   889: page=f77b9ca0 i=0 skb=f5b91540 offset=0
>   891: can_coalesce=0
>   906
>   908
>   912
>   871
>   875
>   880
>   889: page=f6e7a300 i=0 skb=f46f5c80 offset=2513
>   891: can_coalesce=0
>   906
>   908
>   912
>   889: page=0000fb80 i=1 skb=f46f5c80 offset=0
>   891: can_coalesce=0
>   906
>   BUG: unable to handle kernel paging request at 0000fb80
> IP: [<c13b4878>] tcp_sendpage+0x568/0x6d0
> 
> If that can help, I also noticed this one once with a
> page that could look valid, so the error was propagated
> further (the send() was 64kB) :
> 
>   ...
>   906
>   908
>   912
>   889: page=f6e79f80 i=6 skb=f37a1c80 offset=0
>   891: can_coalesce=0
>   906
>   908
>   912
>   889: page=c13d58a7 i=7 skb=f37a1c80 offset=927
>   891: can_coalesce=0
>   906
>   908
>   912
>   889: page=c13b4310 i=8 skb=f37a1c80 offset=3877
>   891: can_coalesce=0
>   906
>   BUG: unable to handle kernel paging request at 8b31752a
>   IP: [<c1094371>] __get_page_tail+0x31/0x90
>   *pde = 00000000 
> 
> Now I'm really at loss, so do not hesitate to ask me for more
> info, because I don't know where to look.

Thanks a lot Willy

I believe do_tcp_sendpages() needs a fix, I'll send a patch asap 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GRO + splice panics in 3.7.0-rc5
  2012-12-01 21:47       ` Eric Dumazet
@ 2012-12-01 22:32         ` Eric Dumazet
  2012-12-01 22:40           ` Willy Tarreau
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2012-12-01 22:32 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev

On Sat, 2012-12-01 at 13:47 -0800, Eric Dumazet wrote:

> Thanks a lot Willy
> 
> I believe do_tcp_sendpages() needs a fix, I'll send a patch asap 
> 

Could you try the following patch ?

do_tcp_sendpages() looks really wrong, as only one page is provided by
the caller.

Thanks !

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e6eace1..6976dba 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -831,8 +831,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
 	return mss_now;
 }
 
-static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
-			 size_t psize, int flags)
+static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
+				size_t size, int flags)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int mss_now, size_goal;
@@ -859,12 +859,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse
 	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
 		goto out_err;
 
-	while (psize > 0) {
+	while (size > 0) {
 		struct sk_buff *skb = tcp_write_queue_tail(sk);
-		struct page *page = pages[poffset / PAGE_SIZE];
 		int copy, i;
-		int offset = poffset % PAGE_SIZE;
-		int size = min_t(size_t, psize, PAGE_SIZE - offset);
 		bool can_coalesce;
 
 		if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
@@ -913,8 +910,8 @@ new_segment:
 			TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH;
 
 		copied += copy;
-		poffset += copy;
-		if (!(psize -= copy))
+		offset += copy;
+		if (!(size -= copy))
 			goto out;
 
 		if (skb->len < size_goal || (flags & MSG_OOB))
@@ -961,7 +958,7 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 					flags);
 
 	lock_sock(sk);
-	res = do_tcp_sendpages(sk, &page, offset, size, flags);
+	res = do_tcp_sendpages(sk, page, offset, size, flags);
 	release_sock(sk);
 	return res;
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: GRO + splice panics in 3.7.0-rc5
  2012-12-01 22:32         ` Eric Dumazet
@ 2012-12-01 22:40           ` Willy Tarreau
  2012-12-01 23:07             ` [PATCH net] tcp: fix crashes in do_tcp_sendpages() Eric Dumazet
  2012-12-02  2:58             ` GRO + splice panics in 3.7.0-rc5 Eric Dumazet
  0 siblings, 2 replies; 12+ messages in thread
From: Willy Tarreau @ 2012-12-01 22:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Sat, Dec 01, 2012 at 02:32:01PM -0800, Eric Dumazet wrote:
> On Sat, 2012-12-01 at 13:47 -0800, Eric Dumazet wrote:
> 
> > Thanks a lot Willy
> > 
> > I believe do_tcp_sendpages() needs a fix, I'll send a patch asap 
> > 
> 
> Could you try the following patch ?
> 
> do_tcp_sendpages() looks really wrong, as only one page is provided by
> the caller.

Excellent Eric, it's rock solid now both with the reproducer and with
haproxy!  Feel free to add my Tested-By if you want.

I really think we should feed this to Linus quickly before he releases 3.7,
as I'm realizing that it would really not be nice to have a user-triggerable
crash in a release :-/

Thanks !
Willy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net] tcp: fix crashes in do_tcp_sendpages()
  2012-12-01 22:40           ` Willy Tarreau
@ 2012-12-01 23:07             ` Eric Dumazet
  2012-12-02  1:41               ` David Miller
  2012-12-02  1:43               ` Joe Perches
  2012-12-02  2:58             ` GRO + splice panics in 3.7.0-rc5 Eric Dumazet
  1 sibling, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2012-12-01 23:07 UTC (permalink / raw)
  To: Willy Tarreau, David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

Recent network changes allowed high order pages being used
for skb fragments.

This uncovered a bug in do_tcp_sendpages() which was assuming its caller
provided an array of order-0 page pointers.

We only have to deal with a single page in this function, and its order
is irrelevant.

Reported-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c |   15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 083092e..e457c7a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -830,8 +830,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
 	return mss_now;
 }
 
-static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
-			 size_t psize, int flags)
+static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
+				size_t size, int flags)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int mss_now, size_goal;
@@ -858,12 +858,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse
 	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
 		goto out_err;
 
-	while (psize > 0) {
+	while (size > 0) {
 		struct sk_buff *skb = tcp_write_queue_tail(sk);
-		struct page *page = pages[poffset / PAGE_SIZE];
 		int copy, i;
-		int offset = poffset % PAGE_SIZE;
-		int size = min_t(size_t, psize, PAGE_SIZE - offset);
 		bool can_coalesce;
 
 		if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
@@ -912,8 +909,8 @@ new_segment:
 			TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH;
 
 		copied += copy;
-		poffset += copy;
-		if (!(psize -= copy))
+		offset += copy;
+		if (!(size -= copy))
 			goto out;
 
 		if (skb->len < size_goal || (flags & MSG_OOB))
@@ -960,7 +957,7 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 					flags);
 
 	lock_sock(sk);
-	res = do_tcp_sendpages(sk, &page, offset, size, flags);
+	res = do_tcp_sendpages(sk, page, offset, size, flags);
 	release_sock(sk);
 	return res;
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net] tcp: fix crashes in do_tcp_sendpages()
  2012-12-01 23:07             ` [PATCH net] tcp: fix crashes in do_tcp_sendpages() Eric Dumazet
@ 2012-12-02  1:41               ` David Miller
  2012-12-02  1:43               ` Joe Perches
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2012-12-02  1:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: w, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 01 Dec 2012 15:07:02 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Recent network changes allowed high order pages being used
> for skb fragments.
> 
> This uncovered a bug in do_tcp_sendpages() which was assuming its caller
> provided an array of order-0 page pointers.
> 
> We only have to deal with a single page in this function, and its order
> is irrelevant.
> 
> Reported-by: Willy Tarreau <w@1wt.eu>
> Tested-by: Willy Tarreau <w@1wt.eu>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] tcp: fix crashes in do_tcp_sendpages()
  2012-12-01 23:07             ` [PATCH net] tcp: fix crashes in do_tcp_sendpages() Eric Dumazet
  2012-12-02  1:41               ` David Miller
@ 2012-12-02  1:43               ` Joe Perches
  1 sibling, 0 replies; 12+ messages in thread
From: Joe Perches @ 2012-12-02  1:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Willy Tarreau, David Miller, netdev

On Sat, 2012-12-01 at 15:07 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Recent network changes allowed high order pages being used
> for skb fragments.
> 
> This uncovered a bug in do_tcp_sendpages() which was assuming its caller
> provided an array of order-0 page pointers.
> 
> We only have to deal with a single page in this function, and its order
> is irrelevant.
[]
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
[]
> @@ -830,8 +830,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
>  	return mss_now;
>  }
>  
> -static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
> -			 size_t psize, int flags)
> +static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
> +				size_t size, int flags)

Shouldn't this now be named do_tcp_sendpage ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GRO + splice panics in 3.7.0-rc5
  2012-12-01 22:40           ` Willy Tarreau
  2012-12-01 23:07             ` [PATCH net] tcp: fix crashes in do_tcp_sendpages() Eric Dumazet
@ 2012-12-02  2:58             ` Eric Dumazet
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2012-12-02  2:58 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev

On Sat, 2012-12-01 at 23:40 +0100, Willy Tarreau wrote:

> Excellent Eric, it's rock solid now both with the reproducer and with
> haproxy!  Feel free to add my Tested-By if you want.

You did a very good work, thanks again Willy.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-12-02  2:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-15 22:28 GRO + splice panics in 3.7.0-rc5 Willy Tarreau
2012-11-15 23:49 ` Eric Dumazet
2012-11-16  6:39   ` Willy Tarreau
2012-12-01 19:43   ` Willy Tarreau
2012-12-01 20:52     ` Willy Tarreau
2012-12-01 21:47       ` Eric Dumazet
2012-12-01 22:32         ` Eric Dumazet
2012-12-01 22:40           ` Willy Tarreau
2012-12-01 23:07             ` [PATCH net] tcp: fix crashes in do_tcp_sendpages() Eric Dumazet
2012-12-02  1:41               ` David Miller
2012-12-02  1:43               ` Joe Perches
2012-12-02  2:58             ` GRO + splice panics in 3.7.0-rc5 Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).