Re: maximum buffer size for splice(2) tcp->pipe?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: maximum buffer size for splice(2) tcp->pipe?
       [not found] <E1LKrp3-004Tub-IR@intern.SerNet.DE>
@ 2009-01-13 20:37 ` Andrew Morton
  2009-01-13 23:15   ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2009-01-13 20:37 UTC (permalink / raw)
  To: Volker.Lendecke; +Cc: linux-kernel, Steven French, Jens Axboe, netdev

(cc's added)

On Thu, 8 Jan 2009 11:13:51 +0100
Volker Lendecke <Volker.Lendecke@SerNet.DE> wrote:

> Hi!
> 
> While implementing splice support in Samba for better
> performance I found it blocking when trying to pull data off
> tcp into a pipe when the recvq was full. Attached find a
> test program that shows this behaviour, on another host I
> started
> 
> netcat 192.168.19.10 4711 < /dev/zero
> 
> vlendec@lenny:~$ uname -a
> Linux lenny 2.6.28-06857-g5cbd04a #7 Wed Jan 7 10:10:42 CET 2009 x86_64 = GNU/Linux
> vlendec@lenny:~$ gcc -o splicetest /host/home/vlendec/splicetest.c -O3 -Wall
> vlendec@lenny:~$ ./splicetest out 65536 &
> [1] 697
> vlendec@lenny:~$ strace -p 697
> Process 697 attached - interrupt to quit
> splice(0x3, 0, 0x5, 0, 0x56a0, 0x1)     = 22176
> splice(0x7, 0, 0x4, 0, 0x10000, 0x1^C <unfinished ...>
> Process 697 detached
> vlendec@lenny:~$ netstat -nt | grep 4711
> tcp    69272      0 192.168.19.10:4711 192.168.19.1:33773 ESTABLISHED
> vlendec@lenny:~$
> 
> Interestingly, whenever I start the strace, it gets another
> chunk of data and then blocks in the next splice call.
> 
> If I start splicetest with a buffer size of 16384 instead of
> 65536, it does not block. I could not find a way to ask the
> kernel for the tipping point below which it does not block.
> 
> What is a safe buffer size to use with splice?
> 
> BTW, this kernel is from Steve French's linux-cifs.git repo.
> 
> Thanks,
> 
> Volker Lendecke
> 
> Samba Team
> 
> P.S: I'm not subscribed to linux-kernel, so if possible
> please CC me directly. If this is inappropriate behaviour,
> please give me a quick hint :-)
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: maximum buffer size for splice(2) tcp->pipe?
  2009-01-13 20:37 ` maximum buffer size for splice(2) tcp->pipe? Andrew Morton
@ 2009-01-13 23:15   ` Eric Dumazet
  2009-01-13 23:38     ` Eric Dumazet
  2009-01-14  7:40     ` Volker Lendecke
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2009-01-13 23:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Volker.Lendecke, linux-kernel, Steven French, Jens Axboe, netdev

Andrew Morton a écrit :
> (cc's added)
> 
> On Thu, 8 Jan 2009 11:13:51 +0100
> Volker Lendecke <Volker.Lendecke@SerNet.DE> wrote:
> 
>> Hi!
>>
>> While implementing splice support in Samba for better
>> performance I found it blocking when trying to pull data off
>> tcp into a pipe when the recvq was full. Attached find a
>> test program that shows this behaviour, on another host I
>> started
>>
>> netcat 192.168.19.10 4711 < /dev/zero
>>
>> vlendec@lenny:~$ uname -a
>> Linux lenny 2.6.28-06857-g5cbd04a #7 Wed Jan 7 10:10:42 CET 2009 x86_64 = GNU/Linux
>> vlendec@lenny:~$ gcc -o splicetest /host/home/vlendec/splicetest.c -O3 -Wall
>> vlendec@lenny:~$ ./splicetest out 65536 &
>> [1] 697
>> vlendec@lenny:~$ strace -p 697
>> Process 697 attached - interrupt to quit
>> splice(0x3, 0, 0x5, 0, 0x56a0, 0x1)     = 22176
>> splice(0x7, 0, 0x4, 0, 0x10000, 0x1^C <unfinished ...>

Volker, your splice() is a blocking one, from tcp socket to a pipe ?

If no other thread is reading the pipe, then you might block forever
in splice_to_pipe() as soon pipe is full (16 pages).

As pages are not necessarly full (each skb will use at least one page, even if 
its length is small), it is not really possible to use splice() like this.

In your case, only safe way with current kernel would be to call splice()
asking for no more than 16 bytes, that would be really insane for your needs.

You may prefer a non blocking mode, at least when calling splice_to_pipe()

Maybe SPLICE_F_NONBLOCK splice() flag should only apply on pipe side.
tcp_splice_read() should not use this flag to select a blocking/nonbloking
mode on the source socket, but underlying file flag.

This way, your program could let socket in blocking mode, yet call splice()
with SPLICE_F_NONBLOCK flag to not block on pipe.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: maximum buffer size for splice(2) tcp->pipe?
  2009-01-13 23:15   ` Eric Dumazet
@ 2009-01-13 23:38     ` Eric Dumazet
  2009-01-15  4:58       ` David Miller
  2009-01-14  7:40     ` Volker Lendecke
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2009-01-13 23:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Volker.Lendecke, linux-kernel, Steven French, Jens Axboe, netdev,
	David S. Miller

Eric Dumazet a écrit :
> Andrew Morton a écrit :
>> (cc's added)
>>
>> On Thu, 8 Jan 2009 11:13:51 +0100
>> Volker Lendecke <Volker.Lendecke@SerNet.DE> wrote:
>>
>>> Hi!
>>>
>>> While implementing splice support in Samba for better
>>> performance I found it blocking when trying to pull data off
>>> tcp into a pipe when the recvq was full. Attached find a
>>> test program that shows this behaviour, on another host I
>>> started
>>>
>>> netcat 192.168.19.10 4711 < /dev/zero
>>>
>>> vlendec@lenny:~$ uname -a
>>> Linux lenny 2.6.28-06857-g5cbd04a #7 Wed Jan 7 10:10:42 CET 2009 x86_64 = GNU/Linux
>>> vlendec@lenny:~$ gcc -o splicetest /host/home/vlendec/splicetest.c -O3 -Wall
>>> vlendec@lenny:~$ ./splicetest out 65536 &
>>> [1] 697
>>> vlendec@lenny:~$ strace -p 697
>>> Process 697 attached - interrupt to quit
>>> splice(0x3, 0, 0x5, 0, 0x56a0, 0x1)     = 22176
>>> splice(0x7, 0, 0x4, 0, 0x10000, 0x1^C <unfinished ...>
> 
> Volker, your splice() is a blocking one, from tcp socket to a pipe ?
> 
> If no other thread is reading the pipe, then you might block forever
> in splice_to_pipe() as soon pipe is full (16 pages).
> 
> As pages are not necessarly full (each skb will use at least one page, even if 
> its length is small), it is not really possible to use splice() like this.
> 
> In your case, only safe way with current kernel would be to call splice()
> asking for no more than 16 bytes, that would be really insane for your needs.
> 
> You may prefer a non blocking mode, at least when calling splice_to_pipe()
> 
> Maybe SPLICE_F_NONBLOCK splice() flag should only apply on pipe side.
> tcp_splice_read() should not use this flag to select a blocking/nonbloking
> mode on the source socket, but underlying file flag.
> 
> This way, your program could let socket in blocking mode, yet call splice()
> with SPLICE_F_NONBLOCK flag to not block on pipe.
> 

This patch, coupled with the previous one from Willy Tarreau 
(tcp: splice as many packets as possible at once)
gives expected result.

[PATCH] net: splice() from tcp to socket should take into account O_NONBLOCK

Instead of using SPLICE_F_NONBLOCK to select a non blocking mode both on
source tcp socket and pipe destination, we use the underlying file flag (O_NONBLOCK)
for selecting a non blocking socket.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

diff --git a/include/linux/net.h b/include/linux/net.h
index 4515efa..10e38d1 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -185,7 +185,7 @@ struct proto_ops {
 				      struct vm_area_struct * vma);
 	ssize_t		(*sendpage)  (struct socket *sock, struct page *page,
 				      int offset, size_t size, int flags);
-	ssize_t 	(*splice_read)(struct socket *sock,  loff_t *ppos,
+	ssize_t 	(*splice_read)(struct file *file,  loff_t *ppos,
 				       struct pipe_inode_info *pipe, size_t len, unsigned int flags);
 };
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 218235d..e8e7f80 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -309,7 +309,7 @@ extern int			tcp_twsk_unique(struct sock *sk,
 
 extern void			tcp_twsk_destructor(struct sock *sk);
 
-extern ssize_t			tcp_splice_read(struct socket *sk, loff_t *ppos,
+extern ssize_t			tcp_splice_read(struct file *file, loff_t *ppos,
 					        struct pipe_inode_info *pipe, size_t len, unsigned int flags);
 
 static inline void tcp_dec_quickack_mode(struct sock *sk,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ce572f9..c777d88 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -548,10 +548,11 @@ static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
  *    Will read pages from given socket and fill them into a pipe.
  *
  **/
-ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
+ssize_t tcp_splice_read(struct file *file, loff_t *ppos,
 			struct pipe_inode_info *pipe, size_t len,
 			unsigned int flags)
 {
+	struct socket *sock = file->private_data;
 	struct sock *sk = sock->sk;
 	struct tcp_splice_state tss = {
 		.pipe = pipe,
@@ -572,7 +573,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
 
 	lock_sock(sk);
 
-	timeo = sock_rcvtimeo(sk, flags & SPLICE_F_NONBLOCK);
+	timeo = sock_rcvtimeo(sk, file->f_flags & O_NONBLOCK);
 	while (tss.len) {
 		ret = __tcp_splice_read(sk, &tss);
 		if (ret < 0)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: maximum buffer size for splice(2) tcp->pipe?
  2009-01-13 23:15   ` Eric Dumazet
  2009-01-13 23:38     ` Eric Dumazet
@ 2009-01-14  7:40     ` Volker Lendecke
  2009-01-14  9:13       ` Eric Dumazet
  1 sibling, 1 reply; 9+ messages in thread
From: Volker Lendecke @ 2009-01-14  7:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, linux-kernel, Steven French, Jens Axboe, netdev

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

On Wed, Jan 14, 2009 at 12:15:04AM +0100, Eric Dumazet wrote:
> Volker, your splice() is a blocking one, from tcp socket to a pipe ?

Yes, it is.

> If no other thread is reading the pipe, then you might block forever
> in splice_to_pipe() as soon pipe is full (16 pages).

Why does it block when the pipe is full? Why doesn't it
return a short read, just like the read(2) call does? We
need to cope with that behaviour anyway.

> As pages are not necessarly full (each skb will use at least one page, even if 
> its length is small), it is not really possible to use splice() like this.
> 
> In your case, only safe way with current kernel would be to call splice()
> asking for no more than 16 bytes, that would be really insane for your needs.
> 
> You may prefer a non blocking mode, at least when calling splice_to_pipe()

Which fd do I have to set the nonblocking flag on? The TCP
socket I read from, or the pipe I write to?

Thanks for the hint anyway :-)

Volker

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: maximum buffer size for splice(2) tcp->pipe?
  2009-01-14  7:40     ` Volker Lendecke
@ 2009-01-14  9:13       ` Eric Dumazet
  2009-01-14 10:03         ` Volker Lendecke
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2009-01-14  9:13 UTC (permalink / raw)
  To: Volker.Lendecke
  Cc: Andrew Morton, linux-kernel, Steven French, Jens Axboe, netdev

Volker Lendecke a écrit :
> On Wed, Jan 14, 2009 at 12:15:04AM +0100, Eric Dumazet wrote:
>> Volker, your splice() is a blocking one, from tcp socket to a pipe ?
> 
> Yes, it is.
> 
>> If no other thread is reading the pipe, then you might block forever
>> in splice_to_pipe() as soon pipe is full (16 pages).
> 
> Why does it block when the pipe is full? Why doesn't it
> return a short read, just like the read(2) call does? We
> need to cope with that behaviour anyway.

Well, check code in fs/splice.c, function splice_to_pipe().

If SPLICE_F_NONBLOCK is not set, it is *expected* to block on pipe.

In this mode, only another thread is able to drain the pipe and wakeup the blocked thread.

Code review :

When all pages are used "if (pipe->nrbufs == PIPE_BUFFERS)"

                if (spd->flags & SPLICE_F_NONBLOCK) {
                        if (!ret)
                                ret = -EAGAIN;
                        break;
                }

                if (signal_pending(current)) {
                        if (!ret)
                                ret = -ERESTARTSYS;
                        break;
                }

                if (do_wakeup) {
                        smp_mb();
                        if (waitqueue_active(&pipe->wait))
                                wake_up_interruptible_sync(&pipe->wait);
                        kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
                        do_wakeup = 0;
                }

                pipe->waiting_writers++;
HERE >>         pipe_wait(pipe);
                pipe->waiting_writers--;


> 
>> As pages are not necessarly full (each skb will use at least one page, even if 
>> its length is small), it is not really possible to use splice() like this.
>>
>> In your case, only safe way with current kernel would be to call splice()
>> asking for no more than 16 bytes, that would be really insane for your needs.
>>
>> You may prefer a non blocking mode, at least when calling splice_to_pipe()
> 
> Which fd do I have to set the nonblocking flag on? The TCP
> socket I read from, or the pipe I write to?

I would say, use the SPLICE_F_NONBLOCK flag on splice() system call,
but let tcp socket in blocking mode... But with current kernel it
wont work. In order to avoid busy looping, you might add a poll()/select()
to call splice(SPLICE_F_NONBLOCK) only when socket has data
in its receive queue.

for (;;) {
	struct pollfd pfd;
	pfd.fd = socket;
	pfd.events = POLLIN;
	if (poll(&pfd, 1, -1) != 1)
		continue;
	res = splice(socket, NULL, pipefds[1], NULL, 65536, SPLICE_F_MOVE|SPLICE_F_NONBLOCK);
	if (res > 0)
		nwritten = splice(pipefds[0], NULL, file_fd, NULL, res, SPLICE_F_MOVE|SPLICE_F_MORE);
}

splice() from tcp socket to pipe is not working as is unfortunatly if !SPLICE_F_NONBLOCK)
and if using the same thread to write and read the pipe. Or risk deadlock.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: maximum buffer size for splice(2) tcp->pipe?
  2009-01-14  9:13       ` Eric Dumazet
@ 2009-01-14 10:03         ` Volker Lendecke
  2009-01-14 10:17           ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Volker Lendecke @ 2009-01-14 10:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, linux-kernel, Steven French, Jens Axboe, netdev

[-- Attachment #1: Type: text/plain, Size: 739 bytes --]

On Wed, Jan 14, 2009 at 10:13:34AM +0100, Eric Dumazet wrote:
> for (;;) {
> 	struct pollfd pfd;
> 	pfd.fd = socket;
> 	pfd.events = POLLIN;
> 	if (poll(&pfd, 1, -1) != 1)
> 		continue;
> 	res = splice(socket, NULL, pipefds[1], NULL, 65536, SPLICE_F_MOVE|SPLICE_F_NONBLOCK);
> 	if (res > 0)
> 		nwritten = splice(pipefds[0], NULL, file_fd, NULL, res, SPLICE_F_MOVE|SPLICE_F_MORE);
> }

Doesn't this reduce performance again? I thought the whole
point of splice() was to increase performance by avoiding
memory copies. If I have to do a poll syscall for each call
to splice, doesn't the context switch eat that performance
advantage again?

Or was splice designed only for multi-threaded applications
(which at least Samba is not)?

Volker

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: maximum buffer size for splice(2) tcp->pipe?
  2009-01-14 10:03         ` Volker Lendecke
@ 2009-01-14 10:17           ` Eric Dumazet
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2009-01-14 10:17 UTC (permalink / raw)
  To: Volker.Lendecke
  Cc: Andrew Morton, linux-kernel, Steven French, Jens Axboe, netdev

Volker Lendecke a écrit :
> On Wed, Jan 14, 2009 at 10:13:34AM +0100, Eric Dumazet wrote:
>> for (;;) {
>> 	struct pollfd pfd;
>> 	pfd.fd = socket;
>> 	pfd.events = POLLIN;
>> 	if (poll(&pfd, 1, -1) != 1)
>> 		continue;
>> 	res = splice(socket, NULL, pipefds[1], NULL, 65536, SPLICE_F_MOVE|SPLICE_F_NONBLOCK);
>> 	if (res > 0)
>> 		nwritten = splice(pipefds[0], NULL, file_fd, NULL, res, SPLICE_F_MOVE|SPLICE_F_MORE);
>> }
> 
> Doesn't this reduce performance again? I thought the whole
> point of splice() was to increase performance by avoiding
> memory copies. If I have to do a poll syscall for each call
> to splice, doesn't the context switch eat that performance
> advantage again?
> 
> Or was splice designed only for multi-threaded applications
> (which at least Samba is not)?
> 
> Volker

splice() avoids memory copies yes, but on typical 1460 bytes
frames its a small gain.

But if no data is available on socket,
you still have to wait (and have a context switch later).

Waiting in poll() or splice() has same context switch cost.

Only cost is the extra syscall of course, but it is mandatory
if you want to avoid a possible deadlock in current splice()
implementation.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: maximum buffer size for splice(2) tcp->pipe?
  2009-01-13 23:38     ` Eric Dumazet
@ 2009-01-15  4:58       ` David Miller
  2009-01-15 11:47         ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2009-01-15  4:58 UTC (permalink / raw)
  To: dada1; +Cc: akpm, Volker.Lendecke, linux-kernel, sfrench, jens.axboe, netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Wed, 14 Jan 2009 00:38:32 +0100

> [PATCH] net: splice() from tcp to socket should take into account O_NONBLOCK
> 
> Instead of using SPLICE_F_NONBLOCK to select a non blocking mode both on
> source tcp socket and pipe destination, we use the underlying file flag (O_NONBLOCK)
> for selecting a non blocking socket.
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

This needs at least some more thought.

It seems, for one thing, that this change will interfere with the
intentions of the code in splice_dirt_to_actor which goes:

	/*
	 * Don't block on output, we have to drain the direct pipe.
	 */
	sd->flags &= ~SPLICE_F_NONBLOCK;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: maximum buffer size for splice(2) tcp->pipe?
  2009-01-15  4:58       ` David Miller
@ 2009-01-15 11:47         ` Eric Dumazet
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2009-01-15 11:47 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, Volker.Lendecke, linux-kernel, sfrench, jens.axboe, netdev

David Miller a écrit :
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Wed, 14 Jan 2009 00:38:32 +0100
> 
>> [PATCH] net: splice() from tcp to socket should take into account O_NONBLOCK
>>
>> Instead of using SPLICE_F_NONBLOCK to select a non blocking mode both on
>> source tcp socket and pipe destination, we use the underlying file flag (O_NONBLOCK)
>> for selecting a non blocking socket.
>>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> 
> This needs at least some more thought.
> 
> It seems, for one thing, that this change will interfere with the
> intentions of the code in splice_dirt_to_actor which goes:
> 
> 	/*
> 	 * Don't block on output, we have to drain the direct pipe.
> 	 */
> 	sd->flags &= ~SPLICE_F_NONBLOCK;

Reading splice_direct_to_actor() I see nothing wrong with the patch

(Patch is about splice from socket to pipe, while the sd->flags you mention
in splice_direct_to_actor() only applies to the splice from internal pipe to
something else, as splice_direct_to_actor() allocates an internal pipe to perform
its work.

Also, the meaning of SPLICE_F_NONBLOCK, as explained in include/linux/splice.h is :

#define SPLICE_F_NONBLOCK (0x02) /* don't block on the pipe splicing (but */
	/* we may still block on the fd we splice */
	/* from/to, of course */

If the comment is still correct, SPLICE_F_NONBLOCK only applies to the pipe implied in
splice() syscall.

For the other file, either its :
- A regular file : nonblocking mode is not available, like a normal read()/write() syscall

- A socket : We should be able to specify if its blocking or not, independantly from
             the SPLICE_F_NONBLOCK flag that only applies to the pipe. Normal way
             is using ioctl(FIONBIO) or other fcntl() call to change file->f_flags O_NONBLOCK


In order to be able to efficiently use splice() from a socket to a file, we need
to do a loop of :

{
splice(from blocking tcp socket to non blocking pipe, SPLICE_F_NONBLOCK); /* nonblocking pipe or risk deadlock */
splice(from pipe to file)
}



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-01-15 11:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E1LKrp3-004Tub-IR@intern.SerNet.DE>
2009-01-13 20:37 ` maximum buffer size for splice(2) tcp->pipe? Andrew Morton
2009-01-13 23:15   ` Eric Dumazet
2009-01-13 23:38     ` Eric Dumazet
2009-01-15  4:58       ` David Miller
2009-01-15 11:47         ` Eric Dumazet
2009-01-14  7:40     ` Volker Lendecke
2009-01-14  9:13       ` Eric Dumazet
2009-01-14 10:03         ` Volker Lendecke
2009-01-14 10:17           ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).