From: Eric Dumazet <dada1@cosmosbay.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Volker.Lendecke@SerNet.DE, linux-kernel@vger.kernel.org,
Steven French <sfrench@us.ibm.com>,
Jens Axboe <jens.axboe@oracle.com>,
netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>
Subject: Re: maximum buffer size for splice(2) tcp->pipe?
Date: Wed, 14 Jan 2009 00:38:32 +0100 [thread overview]
Message-ID: <496D25F8.2080505@cosmosbay.com> (raw)
In-Reply-To: <496D2078.9080302@cosmosbay.com>
Eric Dumazet a écrit :
> Andrew Morton a écrit :
>> (cc's added)
>>
>> On Thu, 8 Jan 2009 11:13:51 +0100
>> Volker Lendecke <Volker.Lendecke@SerNet.DE> wrote:
>>
>>> Hi!
>>>
>>> While implementing splice support in Samba for better
>>> performance I found it blocking when trying to pull data off
>>> tcp into a pipe when the recvq was full. Attached find a
>>> test program that shows this behaviour, on another host I
>>> started
>>>
>>> netcat 192.168.19.10 4711 < /dev/zero
>>>
>>> vlendec@lenny:~$ uname -a
>>> Linux lenny 2.6.28-06857-g5cbd04a #7 Wed Jan 7 10:10:42 CET 2009 x86_64 = GNU/Linux
>>> vlendec@lenny:~$ gcc -o splicetest /host/home/vlendec/splicetest.c -O3 -Wall
>>> vlendec@lenny:~$ ./splicetest out 65536 &
>>> [1] 697
>>> vlendec@lenny:~$ strace -p 697
>>> Process 697 attached - interrupt to quit
>>> splice(0x3, 0, 0x5, 0, 0x56a0, 0x1) = 22176
>>> splice(0x7, 0, 0x4, 0, 0x10000, 0x1^C <unfinished ...>
>
> Volker, your splice() is a blocking one, from tcp socket to a pipe ?
>
> If no other thread is reading the pipe, then you might block forever
> in splice_to_pipe() as soon pipe is full (16 pages).
>
> As pages are not necessarly full (each skb will use at least one page, even if
> its length is small), it is not really possible to use splice() like this.
>
> In your case, only safe way with current kernel would be to call splice()
> asking for no more than 16 bytes, that would be really insane for your needs.
>
> You may prefer a non blocking mode, at least when calling splice_to_pipe()
>
> Maybe SPLICE_F_NONBLOCK splice() flag should only apply on pipe side.
> tcp_splice_read() should not use this flag to select a blocking/nonbloking
> mode on the source socket, but underlying file flag.
>
> This way, your program could let socket in blocking mode, yet call splice()
> with SPLICE_F_NONBLOCK flag to not block on pipe.
>
This patch, coupled with the previous one from Willy Tarreau
(tcp: splice as many packets as possible at once)
gives expected result.
[PATCH] net: splice() from tcp to socket should take into account O_NONBLOCK
Instead of using SPLICE_F_NONBLOCK to select a non blocking mode both on
source tcp socket and pipe destination, we use the underlying file flag (O_NONBLOCK)
for selecting a non blocking socket.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
diff --git a/include/linux/net.h b/include/linux/net.h
index 4515efa..10e38d1 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -185,7 +185,7 @@ struct proto_ops {
struct vm_area_struct * vma);
ssize_t (*sendpage) (struct socket *sock, struct page *page,
int offset, size_t size, int flags);
- ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
+ ssize_t (*splice_read)(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len, unsigned int flags);
};
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 218235d..e8e7f80 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -309,7 +309,7 @@ extern int tcp_twsk_unique(struct sock *sk,
extern void tcp_twsk_destructor(struct sock *sk);
-extern ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos,
+extern ssize_t tcp_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len, unsigned int flags);
static inline void tcp_dec_quickack_mode(struct sock *sk,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ce572f9..c777d88 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -548,10 +548,11 @@ static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
* Will read pages from given socket and fill them into a pipe.
*
**/
-ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
+ssize_t tcp_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags)
{
+ struct socket *sock = file->private_data;
struct sock *sk = sock->sk;
struct tcp_splice_state tss = {
.pipe = pipe,
@@ -572,7 +573,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
lock_sock(sk);
- timeo = sock_rcvtimeo(sk, flags & SPLICE_F_NONBLOCK);
+ timeo = sock_rcvtimeo(sk, file->f_flags & O_NONBLOCK);
while (tss.len) {
ret = __tcp_splice_read(sk, &tss);
if (ret < 0)
next prev parent reply other threads:[~2009-01-13 23:38 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <E1LKrp3-004Tub-IR@intern.SerNet.DE>
2009-01-13 20:37 ` maximum buffer size for splice(2) tcp->pipe? Andrew Morton
2009-01-13 23:15 ` Eric Dumazet
2009-01-13 23:38 ` Eric Dumazet [this message]
2009-01-15 4:58 ` David Miller
2009-01-15 11:47 ` Eric Dumazet
2009-01-14 7:40 ` Volker Lendecke
2009-01-14 9:13 ` Eric Dumazet
2009-01-14 10:03 ` Volker Lendecke
2009-01-14 10:17 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=496D25F8.2080505@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=Volker.Lendecke@SerNet.DE \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sfrench@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).