[PATCH] tcp: splice: fix an infinite loop in tcp_read

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] tcp: splice: fix an infinite loop in tcp_read_sock()
@ 2013-01-10 17:06 Eric Dumazet
  2013-01-10 22:37 ` David Miller
  2013-01-10 23:01 ` Willy Tarreau
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2013-01-10 17:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit 02275a2ee7c0 (tcp: don't abort splice() after small transfers)
added a regression.


[   83.843570] INFO: rcu_sched self-detected stall on CPU
[   83.844575] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=21002 jiffies, g=4457, c=4456, q=13132)
[   83.844582] Task dump for CPU 6:
[   83.844584] netperf         R  running task        0  8966   8952 0x0000000c
[   83.844587]  0000000000000000 0000000000000006 0000000000006c6c 0000000000000000
[   83.844589]  000000000000006c 0000000000000096 ffffffff819ce2bc ffffffffffffff10
[   83.844592]  ffffffff81088679 0000000000000010 0000000000000246 ffff880c4b9ddcd8
[   83.844594] Call Trace:
[   83.844596]  [<ffffffff81088679>] ? vprintk_emit+0x1c9/0x4c0
[   83.844601]  [<ffffffff815ad449>] ? schedule+0x29/0x70
[   83.844606]  [<ffffffff81537bd2>] ? tcp_splice_data_recv+0x42/0x50
[   83.844610]  [<ffffffff8153beaa>] ? tcp_read_sock+0xda/0x260
[   83.844613]  [<ffffffff81537b90>] ? tcp_prequeue_process+0xb0/0xb0
[   83.844615]  [<ffffffff8153c0f0>] ? tcp_splice_read+0xc0/0x250
[   83.844618]  [<ffffffff814dc0c2>] ? sock_splice_read+0x22/0x30
[   83.844622]  [<ffffffff811b820b>] ? do_splice_to+0x7b/0xa0
[   83.844627]  [<ffffffff811ba4bc>] ? sys_splice+0x59c/0x5d0
[   83.844630]  [<ffffffff8119745b>] ? putname+0x2b/0x40
[   83.844633]  [<ffffffff8118bcb4>] ? do_sys_open+0x174/0x1e0
[   83.844636]  [<ffffffff815b6202>] ? system_call_fastpath+0x16/0x1b


if recv_actor() returns 0, we should stop immediately,
because looping wont give a chance to drain the pipe.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1ca2536..5f173dc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1482,7 +1482,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 					break;
 			}
 			used = recv_actor(desc, skb, offset, len);
-			if (used < 0) {
+			if (used <= 0) {
 				if (!copied)
 					copied = used;
 				break;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] tcp: splice: fix an infinite loop in tcp_read_sock()
  2013-01-10 17:06 [PATCH] tcp: splice: fix an infinite loop in tcp_read_sock() Eric Dumazet
@ 2013-01-10 22:37 ` David Miller
  2013-01-10 23:01 ` Willy Tarreau
  1 sibling, 0 replies; 6+ messages in thread
From: David Miller @ 2013-01-10 22:37 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, w

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 10 Jan 2013 09:06:10 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> commit 02275a2ee7c0 (tcp: don't abort splice() after small transfers)
> added a regression.
 ...
> if recv_actor() returns 0, we should stop immediately,
> because looping wont give a chance to drain the pipe.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] tcp: splice: fix an infinite loop in tcp_read_sock()
  2013-01-10 17:06 [PATCH] tcp: splice: fix an infinite loop in tcp_read_sock() Eric Dumazet
  2013-01-10 22:37 ` David Miller
@ 2013-01-10 23:01 ` Willy Tarreau
  2013-01-10 23:05   ` Eric Dumazet
  1 sibling, 1 reply; 6+ messages in thread
From: Willy Tarreau @ 2013-01-10 23:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

Hi Eric,

On Thu, Jan 10, 2013 at 09:06:10AM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> commit 02275a2ee7c0 (tcp: don't abort splice() after small transfers)
> added a regression.
> 
> 
> [   83.843570] INFO: rcu_sched self-detected stall on CPU
> [   83.844575] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=21002 jiffies, g=4457, c=4456, q=13132)
> [   83.844582] Task dump for CPU 6:
> [   83.844584] netperf         R  running task        0  8966   8952 0x0000000c
> [   83.844587]  0000000000000000 0000000000000006 0000000000006c6c 0000000000000000
> [   83.844589]  000000000000006c 0000000000000096 ffffffff819ce2bc ffffffffffffff10
> [   83.844592]  ffffffff81088679 0000000000000010 0000000000000246 ffff880c4b9ddcd8
> [   83.844594] Call Trace:
> [   83.844596]  [<ffffffff81088679>] ? vprintk_emit+0x1c9/0x4c0
> [   83.844601]  [<ffffffff815ad449>] ? schedule+0x29/0x70
> [   83.844606]  [<ffffffff81537bd2>] ? tcp_splice_data_recv+0x42/0x50
> [   83.844610]  [<ffffffff8153beaa>] ? tcp_read_sock+0xda/0x260
> [   83.844613]  [<ffffffff81537b90>] ? tcp_prequeue_process+0xb0/0xb0
> [   83.844615]  [<ffffffff8153c0f0>] ? tcp_splice_read+0xc0/0x250
> [   83.844618]  [<ffffffff814dc0c2>] ? sock_splice_read+0x22/0x30
> [   83.844622]  [<ffffffff811b820b>] ? do_splice_to+0x7b/0xa0
> [   83.844627]  [<ffffffff811ba4bc>] ? sys_splice+0x59c/0x5d0
> [   83.844630]  [<ffffffff8119745b>] ? putname+0x2b/0x40
> [   83.844633]  [<ffffffff8118bcb4>] ? do_sys_open+0x174/0x1e0
> [   83.844636]  [<ffffffff815b6202>] ? system_call_fastpath+0x16/0x1b
> 
> 
> if recv_actor() returns 0, we should stop immediately,
> because looping wont give a chance to drain the pipe.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Willy Tarreau <w@1wt.eu>
> ---
>  net/ipv4/tcp.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 1ca2536..5f173dc 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1482,7 +1482,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
>  					break;
>  			}
>  			used = recv_actor(desc, skb, offset, len);
> -			if (used < 0) {
> +			if (used <= 0) {
>  				if (!copied)
>  					copied = used;
>  				break;

Thanks for catching this one. I'm amazed we didn't notice it earlier,
I've been running my stress-test kernels with this patch applied since
we did it.

Willy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] tcp: splice: fix an infinite loop in tcp_read_sock()
  2013-01-10 23:01 ` Willy Tarreau
@ 2013-01-10 23:05   ` Eric Dumazet
  2013-01-10 23:21     ` Willy Tarreau
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2013-01-10 23:05 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Miller, netdev

On Fri, 2013-01-11 at 00:01 +0100, Willy Tarreau wrote:
> Hi Eric,

> Thanks for catching this one. I'm amazed we didn't notice it earlier,
> I've been running my stress-test kernels with this patch applied since
> we did it.
> 

Thats because you splice(    very_large_amount_of_bytes), so you dont
hit this bug.

netperf does the splice (    exact_amount_of_bytes ) so hits this pretty
fast on loopback at least.

Thanks !

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] tcp: splice: fix an infinite loop in tcp_read_sock()
  2013-01-10 23:05   ` Eric Dumazet
@ 2013-01-10 23:21     ` Willy Tarreau
  2013-01-10 23:48       ` Rick Jones
  0 siblings, 1 reply; 6+ messages in thread
From: Willy Tarreau @ 2013-01-10 23:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Thu, Jan 10, 2013 at 03:05:55PM -0800, Eric Dumazet wrote:
> On Fri, 2013-01-11 at 00:01 +0100, Willy Tarreau wrote:
> > Hi Eric,
> 
> > Thanks for catching this one. I'm amazed we didn't notice it earlier,
> > I've been running my stress-test kernels with this patch applied since
> > we did it.
> > 
> 
> Thats because you splice(    very_large_amount_of_bytes), so you dont
> hit this bug.

Not always, I use many sizes (from 1k to very large).

> netperf does the splice (    exact_amount_of_bytes ) so hits this pretty
> fast on loopback at least.

OK I see, if we need an exact size to trigger it, that explains it !

Thanks for the explanation,
Willy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] tcp: splice: fix an infinite loop in tcp_read_sock()
  2013-01-10 23:21     ` Willy Tarreau
@ 2013-01-10 23:48       ` Rick Jones
  0 siblings, 0 replies; 6+ messages in thread
From: Rick Jones @ 2013-01-10 23:48 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Eric Dumazet, David Miller, netdev

On 01/10/2013 03:21 PM, Willy Tarreau wrote:
> On Thu, Jan 10, 2013 at 03:05:55PM -0800, Eric Dumazet wrote:
>> Thats because you splice(    very_large_amount_of_bytes), so you dont
>> hit this bug.
>
> Not always, I use many sizes (from 1k to very large).
>
>> netperf does the splice (    exact_amount_of_bytes ) so hits this pretty
>> fast on loopback at least.
>
> OK I see, if we need an exact size to trigger it, that explains it !

Netperf does not use a specific size all the time - the size it uses on 
the receive will be the "receive_size" calculated the same way it has 
been since the beginning - either a size specified by a test-specific -M 
option, or based on the value of SO_RCVBUF at the time the socket was 
created.

The kernel of the code making the splice calls - recv_data_no_copy() in 
src/nettest_omni.c looks like:

recv_data_no_copy(SOCKET data_socket, struct ring_elt *recv_ring, 
uint32_t bytes_to_recv, struct sockaddr *source, netperf_socklen_t 
*sourcelen, uint32_t flags, uint32_t *num_receives) {

...

   do {

     bytes_recvd = splice(data_socket,
			 NULL,
			 pfd[1],
			 NULL,
			 bytes_left,
			 my_flags);

     if (bytes_recvd > 0) {
       /* per Eric Dumazet, we should just let this second splice call
	 move as many bytes as it can and not worry about how much.
	 this should make the call more robust when made on a system
	 under memory pressure */
       splice(pfd[0], NULL, fdnull, NULL, 1 << 30, my_flags);
       bytes_left -= bytes_recvd;
     }
     else {
       break;
     }
     my_recvs++; /* should the pair of splices count as one? */
   } while ((bytes_left > 0) && (flags & NETPERF_WAITALL));

where NETPERF_WAITALL is only set for an _RR test.  Bytes_left is 
initialized to bytes_to_recv which is the "receive_size." my_flags is 
set to 0x03.

Now, if there are no test-specific -M option (or -s or -S depending on 
the test) netperf will, from run to run use the same receive_size - 
under Linux chances are quite good that will be 87380.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-01-10 23:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-10 17:06 [PATCH] tcp: splice: fix an infinite loop in tcp_read_sock() Eric Dumazet
2013-01-10 22:37 ` David Miller
2013-01-10 23:01 ` Willy Tarreau
2013-01-10 23:05   ` Eric Dumazet
2013-01-10 23:21     ` Willy Tarreau
2013-01-10 23:48       ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).