* Splice on blocking TCP sockets again..
@ 2009-09-30 0:48 Jason Gunthorpe
2009-09-30 4:54 ` Eric Dumazet
2009-09-30 6:37 ` Volker Lendecke
0 siblings, 2 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2009-09-30 0:48 UTC (permalink / raw)
To: Eric Dumazet, netdev; +Cc: David S. Miller, Volker Lendecke
Eric,
I saw your patch from January regarding splicing on blocking sockets,
and I wondered what ever happened to it?
http://lkml.org/lkml/2009/1/13/507
It doesn't look like it has been applied.. I see the patch thread died
at davem's comments?
I have run into exactly the same problem as Samba, where I'd like the
TCP socket to be blocking, and the pipe to be non blocking ...
As it stands,
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE);
causes a random endless block and
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
will return 0 immediately if the TCP buffer is empty.
FWIW, it looks like samba has a splice code now, but doesn't enable it
due to this issue?
http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD
Thanks,
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Splice on blocking TCP sockets again..
2009-09-30 0:48 Splice on blocking TCP sockets again Jason Gunthorpe
@ 2009-09-30 4:54 ` Eric Dumazet
2009-09-30 5:40 ` Jason Gunthorpe
2009-09-30 6:37 ` Volker Lendecke
1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2009-09-30 4:54 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: netdev, David S. Miller, Volker Lendecke
Jason Gunthorpe a écrit :
> Eric,
>
> I saw your patch from January regarding splicing on blocking sockets,
> and I wondered what ever happened to it?
>
> http://lkml.org/lkml/2009/1/13/507
>
> It doesn't look like it has been applied.. I see the patch thread died
> at davem's comments?
>
> I have run into exactly the same problem as Samba, where I'd like the
> TCP socket to be blocking, and the pipe to be non blocking ...
>
> As it stands,
> splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE);
> causes a random endless block and
> splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
> will return 0 immediately if the TCP buffer is empty.
>
> FWIW, it looks like samba has a splice code now, but doesn't enable it
> due to this issue?
>
> http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD
>
> Thanks,
> Jason
Hi Jason, thanks for this reminding
Hmm, most probably I did not replied correctly do David objection which was :
Date Wed, 14 Jan 2009 20:58:39 -0800 (PST)
Subject Re: maximum buffer size for splice(2) tcp->pipe?
From David Miller <>
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Wed, 14 Jan 2009 00:38:32 +0100
> [PATCH] net: splice() from tcp to socket should take into account O_NONBLOCK
>
> Instead of using SPLICE_F_NONBLOCK to select a non blocking mode both on
> source tcp socket and pipe destination, we use the underlying file flag (O_NONBLOCK)
> for selecting a non blocking socket.
>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
This needs at least some more thought.
It seems, for one thing, that this change will interfere with the
intentions of the code in splice_dirt_to_actor which goes:
/*
* Don't block on output, we have to drain the direct pipe.
*/
sd->flags &= ~SPLICE_F_NONBLOCK;
------------------------------------------------------------------------------
But splice_dist_to_actor() handles a REG/BLK file as input and a pipe as output,
so I believe my patch wont change splice_dist_to_actor() behavior.
My patch title was wrong :
net: splice() from tcp to socket should take into account O_NONBLOCK
So maybe David was mistaken by the title :)
[PATCH] net: splice() from tcp to pipe should take into account O_NONBLOCK
Before this patch :
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE);
causes a random endless block (if pipe is full) and
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
will return 0 immediately if the TCP buffer is empty.
User application has no way to instruct splice() that socket should be in blocking mode
but pipe in nonblock more.
http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD
One way to handle this is to switch tcp_read() to use the underlying file O_NONBLOCK
flag, as other socket operations do. And let SPLICE_F_NONBLOCK control the pipe output only.
Users will then call :
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK );
to block on data coming from socket (if file is in blocking mode),
and not block on pipe output (to avoid deadlock)
Reported-by: Volker Lendecke <vl@samba.org>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 21387eb..8cdfab6 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -580,7 +580,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
lock_sock(sk);
- timeo = sock_rcvtimeo(sk, flags & SPLICE_F_NONBLOCK);
+ timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
while (tss.len) {
ret = __tcp_splice_read(sk, &tss);
if (ret < 0)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Splice on blocking TCP sockets again..
2009-09-30 4:54 ` Eric Dumazet
@ 2009-09-30 5:40 ` Jason Gunthorpe
2009-09-30 5:51 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2009-09-30 5:40 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, David S. Miller, Volker Lendecke
> One way to handle this is to switch tcp_read() to use the underlying file O_NONBLOCK
> flag, as other socket operations do. And let SPLICE_F_NONBLOCK control the pipe output only.
Thanks Eric, this seems reasonable from my userspace perspective.
I admit I don't understand why SPLICE_F_NONBLOCK exists, it seems very
un-unixy to have a syscall completely ignore the NONBLOCK flag of the
fd it is called on. Ie setting NONBLOCK on the pipe itself does
nothing when using splice..
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Splice on blocking TCP sockets again..
2009-09-30 5:40 ` Jason Gunthorpe
@ 2009-09-30 5:51 ` Eric Dumazet
2009-09-30 6:00 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2009-09-30 5:51 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: netdev, David S. Miller, Volker Lendecke
Jason Gunthorpe a écrit :
>> One way to handle this is to switch tcp_read() to use the underlying file O_NONBLOCK
>> flag, as other socket operations do. And let SPLICE_F_NONBLOCK control the pipe output only.
arg, this was tcp_splice_read() of course
>
> Thanks Eric, this seems reasonable from my userspace perspective.
>
> I admit I don't understand why SPLICE_F_NONBLOCK exists, it seems very
> un-unixy to have a syscall completely ignore the NONBLOCK flag of the
> fd it is called on. Ie setting NONBLOCK on the pipe itself does
> nothing when using splice..
>
Hmm, good question, I dont have the answer but I'll digg one.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Splice on blocking TCP sockets again..
2009-09-30 5:51 ` Eric Dumazet
@ 2009-09-30 6:00 ` Eric Dumazet
2009-09-30 6:19 ` Eric Dumazet
2009-10-01 22:17 ` Jason Gunthorpe
0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2009-09-30 6:00 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: netdev, David S. Miller, Volker Lendecke
Eric Dumazet a écrit :
> Jason Gunthorpe a écrit :
>>> One way to handle this is to switch tcp_read() to use the underlying file O_NONBLOCK
>>> flag, as other socket operations do. And let SPLICE_F_NONBLOCK control the pipe output only.
>
> arg, this was tcp_splice_read() of course
>
>> Thanks Eric, this seems reasonable from my userspace perspective.
>>
>> I admit I don't understand why SPLICE_F_NONBLOCK exists, it seems very
>> un-unixy to have a syscall completely ignore the NONBLOCK flag of the
>> fd it is called on. Ie setting NONBLOCK on the pipe itself does
>> nothing when using splice..
>>
>
> Hmm, good question, I dont have the answer but I'll digg one.
>
commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
splice: add SPLICE_F_NONBLOCK flag
It doesn't make the splice itself necessarily nonblocking (because the
actual file descriptors that are spliced from/to may block unless they
have the O_NONBLOCK flag set), but it makes the splice pipe operations
nonblocking.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
See Linus intention was pretty clear : O_NONBLOCK should be taken into account
by 'actual file that are spliced from/to', regardless of SPLICE_F_NONBLOCK flag
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Splice on blocking TCP sockets again..
2009-09-30 6:00 ` Eric Dumazet
@ 2009-09-30 6:19 ` Eric Dumazet
2009-10-01 22:17 ` Jason Gunthorpe
1 sibling, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2009-09-30 6:19 UTC (permalink / raw)
Cc: Jason Gunthorpe, netdev, David S. Miller, Volker Lendecke,
Octavian Purdila
Eric Dumazet a écrit :
> Eric Dumazet a écrit :
>> Jason Gunthorpe a écrit :
>>>> One way to handle this is to switch tcp_read() to use the underlying file O_NONBLOCK
>>>> flag, as other socket operations do. And let SPLICE_F_NONBLOCK control the pipe output only.
>> arg, this was tcp_splice_read() of course
>>
>>> Thanks Eric, this seems reasonable from my userspace perspective.
>>>
>>> I admit I don't understand why SPLICE_F_NONBLOCK exists, it seems very
>>> un-unixy to have a syscall completely ignore the NONBLOCK flag of the
>>> fd it is called on. Ie setting NONBLOCK on the pipe itself does
>>> nothing when using splice..
>>>
>> Hmm, good question, I dont have the answer but I'll digg one.
>>
>
> commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
> splice: add SPLICE_F_NONBLOCK flag
>
> It doesn't make the splice itself necessarily nonblocking (because the
> actual file descriptors that are spliced from/to may block unless they
> have the O_NONBLOCK flag set), but it makes the splice pipe operations
> nonblocking.
>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>
>
> See Linus intention was pretty clear : O_NONBLOCK should be taken into account
> by 'actual file that are spliced from/to', regardless of SPLICE_F_NONBLOCK flag
>
I also found first submission of the patch from Octavian Purdila,
so credit should be given to Octavian as well.
http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0687.html
We could add Linus into the discussion if it can help to make progress on this point.
I personally stopped to use splice(tcp -> pipe) in my projects because it was not usable
in a reliable way.
Thanks
[PATCH] net: splice() from tcp to pipe should take into account O_NONBLOCK
tcp_splice_read() doesnt take into account socket's O_NONBLOCK flag
Before this patch :
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE);
causes a random endless block (if pipe is full) and
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
will return 0 immediately if the TCP buffer is empty.
User application has no way to instruct splice() that socket should be in blocking mode
but pipe in nonblock more.
Many projects cannot use splice(tcp -> pipe) because of this flaw.
http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD
http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0687.html
Linus introduced SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
(splice: add SPLICE_F_NONBLOCK flag )
It doesn't make the splice itself necessarily nonblocking (because the
actual file descriptors that are spliced from/to may block unless they
have the O_NONBLOCK flag set), but it makes the splice pipe operations
nonblocking.
Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only
This patch instruct tcp_splice_read() to use the underlying file O_NONBLOCK
flag, as other socket operations do.
Users will then call :
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK );
to block on data coming from socket (if file is in blocking mode),
and not block on pipe output (to avoid deadlock)
First version of this patch was submitted by Octavian Purdila
Reported-by: Volker Lendecke <vl@samba.org>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
---
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 21387eb..8cdfab6 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -580,7 +580,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
lock_sock(sk);
- timeo = sock_rcvtimeo(sk, flags & SPLICE_F_NONBLOCK);
+ timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
while (tss.len) {
ret = __tcp_splice_read(sk, &tss);
if (ret < 0)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Splice on blocking TCP sockets again..
2009-09-30 0:48 Splice on blocking TCP sockets again Jason Gunthorpe
2009-09-30 4:54 ` Eric Dumazet
@ 2009-09-30 6:37 ` Volker Lendecke
2009-10-02 17:10 ` Jason Gunthorpe
1 sibling, 1 reply; 10+ messages in thread
From: Volker Lendecke @ 2009-09-30 6:37 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Eric Dumazet, netdev, David S. Miller, Volker Lendecke
[-- Attachment #1: Type: text/plain, Size: 542 bytes --]
On Tue, Sep 29, 2009 at 06:48:20PM -0600, Jason Gunthorpe wrote:
> FWIW, it looks like samba has a splice code now, but doesn't enable it
> due to this issue?
Right. What I've learned from the comments is that splice is
only usable in multi-threaded programs. One thread is
reading, one is writing from the other end. I deferred using
splice until we have the proper architecture to do sync
syscalls in helper threads to make them virtually async. We
have some code for that now, but it's not a high priority
for me at this moment.
Volker
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Splice on blocking TCP sockets again..
2009-09-30 6:00 ` Eric Dumazet
2009-09-30 6:19 ` Eric Dumazet
@ 2009-10-01 22:17 ` Jason Gunthorpe
1 sibling, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2009-10-01 22:17 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, David S. Miller, Volker Lendecke, linux-kernel
On Wed, Sep 30, 2009 at 08:00:04AM +0200, Eric Dumazet wrote:
> >> I admit I don't understand why SPLICE_F_NONBLOCK exists, it seems very
> >> un-unixy to have a syscall completely ignore the NONBLOCK flag of the
> >> fd it is called on. Ie setting NONBLOCK on the pipe itself does
> >> nothing when using splice..
> >
> > Hmm, good question, I dont have the answer but I'll digg one.
> >
>
> commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
> splice: add SPLICE_F_NONBLOCK flag
>
> It doesn't make the splice itself necessarily nonblocking (because the
> actual file descriptors that are spliced from/to may block unless they
> have the O_NONBLOCK flag set), but it makes the splice pipe operations
> nonblocking.
>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>
> See Linus intention was pretty clear : O_NONBLOCK should be taken
> into account by 'actual file that are spliced from/to', regardless
> of SPLICE_F_NONBLOCK flag
Yes, that seems reasonable.
What confuses me is that if O_NONBLOCK is set on the _pipe_ and
SPICE_F_NONBLOCK is not set on the splice call the splice still blocks
- that is unlike other unix apis, eg MSG_DONTWAIT
It seems to me that SPICE_F_NONBLOCK should be or'd with O_NONBLOCK on
the pipe?
Thanks,
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Splice on blocking TCP sockets again..
2009-09-30 6:37 ` Volker Lendecke
@ 2009-10-02 17:10 ` Jason Gunthorpe
2009-10-02 18:05 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2009-10-02 17:10 UTC (permalink / raw)
To: Volker Lendecke; +Cc: Eric Dumazet, netdev, Volker Lendecke
On Wed, Sep 30, 2009 at 08:37:13AM +0200, Volker Lendecke wrote:
> On Tue, Sep 29, 2009 at 06:48:20PM -0600, Jason Gunthorpe wrote:
> > FWIW, it looks like samba has a splice code now, but doesn't enable it
> > due to this issue?
>
> Right. What I've learned from the comments is that splice is
> only usable in multi-threaded programs. One thread is
> reading, one is writing from the other end. I deferred using
> splice until we have the proper architecture to do sync
> syscalls in helper threads to make them virtually async. We
> have some code for that now, but it's not a high priority
> for me at this moment.
So, it looks like thanks to Eric and davem that splice will be changed
so it can be blocking on the TCP and non-blocking on the PIPE.
I'd suggest a construct like the following as a compatability
solution:
struct pollfd pfd = {.fd = tcpfd, events = POLLIN | POLLRDHUP};
while (..) {
rc = splice(tcpfd,0,pfd[1],0,count,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
if (rc == -1)
//...
if (rc == 0) {
if (pfd.revents & POLLRDHUP)
// oops, EOF on TCP
/* Might be an old kernel that nonblocks on TCP, have to check
if this is EOF or do blocking. */
rc = poll(&pfd,1,-1);
if (rc == -1)
//...
}
rc = splice(pfd[0],0,ofd,0,..., SPLICE_F_MOVE)
}
Which should add no overhead in the new splice blocks case, and falls
back gracefully on older kernels..
Thanks,
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Splice on blocking TCP sockets again..
2009-10-02 17:10 ` Jason Gunthorpe
@ 2009-10-02 18:05 ` Eric Dumazet
0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2009-10-02 18:05 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Volker Lendecke, netdev, Volker Lendecke
Jason Gunthorpe a écrit :
>
> I'd suggest a construct like the following as a compatability
> solution:
>
> struct pollfd pfd = {.fd = tcpfd, events = POLLIN | POLLRDHUP};
> while (..) {
> rc = splice(tcpfd,0,pfd[1],0,count,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
> if (rc == -1)
> //...
> if (rc == 0) {
> if (pfd.revents & POLLRDHUP)
> // oops, EOF on TCP
>
> /* Might be an old kernel that nonblocks on TCP, have to check
> if this is EOF or do blocking. */
> rc = poll(&pfd,1,-1);
> if (rc == -1)
> //...
> }
>
> rc = splice(pfd[0],0,ofd,0,..., SPLICE_F_MOVE)
> }
>
> Which should add no overhead in the new splice blocks case, and falls
> back gracefully on older kernels..
>
Agreed, thanks for the tip.
Indeed, new kernel will permit a loop with only splice() syscalls, while on an old
kernel, some poll() syscalls might be needed if tcp socket is empty.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-10-02 18:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-30 0:48 Splice on blocking TCP sockets again Jason Gunthorpe
2009-09-30 4:54 ` Eric Dumazet
2009-09-30 5:40 ` Jason Gunthorpe
2009-09-30 5:51 ` Eric Dumazet
2009-09-30 6:00 ` Eric Dumazet
2009-09-30 6:19 ` Eric Dumazet
2009-10-01 22:17 ` Jason Gunthorpe
2009-09-30 6:37 ` Volker Lendecke
2009-10-02 17:10 ` Jason Gunthorpe
2009-10-02 18:05 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).