* SPLICE_F_NONBLOCK semantics...
@ 2009-10-01 22:11 David Miller
2009-10-01 22:21 ` Linus Torvalds
0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2009-10-01 22:11 UTC (permalink / raw)
To: torvalds; +Cc: eric.dumazet, jgunthorpe, vl, opurdila, netdev, linux-kernel
Linus, I plan on putting the fix below into my tree.
It depends upon our interpretation of how you intended the
SPLICE_F_NONBLOCK flag to work when you added it way back
when.
Could you take a quick look and make sure our interpretation matches
your intent? This behavior has been bugging people for a while and I
want to close this out, one way or another.
Thanks!
[PATCH] net: splice() from tcp to pipe should take into account O_NONBLOCK
tcp_splice_read() doesnt take into account socket's O_NONBLOCK flag
Before this patch :
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE);
causes a random endless block (if pipe is full) and
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
will return 0 immediately if the TCP buffer is empty.
User application has no way to instruct splice() that socket should be in blocking mode
but pipe in nonblock more.
Many projects cannot use splice(tcp -> pipe) because of this flaw.
http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD
http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0687.html
Linus introduced SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
(splice: add SPLICE_F_NONBLOCK flag )
It doesn't make the splice itself necessarily nonblocking (because the
actual file descriptors that are spliced from/to may block unless they
have the O_NONBLOCK flag set), but it makes the splice pipe operations
nonblocking.
Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only
This patch instruct tcp_splice_read() to use the underlying file O_NONBLOCK
flag, as other socket operations do.
Users will then call :
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK );
to block on data coming from socket (if file is in blocking mode),
and not block on pipe output (to avoid deadlock)
First version of this patch was submitted by Octavian Purdila
Reported-by: Volker Lendecke <vl@samba.org>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
---
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 21387eb..8cdfab6 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -580,7 +580,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
lock_sock(sk);
- timeo = sock_rcvtimeo(sk, flags & SPLICE_F_NONBLOCK);
+ timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
while (tss.len) {
ret = __tcp_splice_read(sk, &tss);
if (ret < 0)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: SPLICE_F_NONBLOCK semantics...
2009-10-01 22:11 SPLICE_F_NONBLOCK semantics David Miller
@ 2009-10-01 22:21 ` Linus Torvalds
2009-10-01 22:24 ` Linus Torvalds
2009-10-01 22:27 ` David Miller
0 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2009-10-01 22:21 UTC (permalink / raw)
To: David Miller
Cc: eric.dumazet, jgunthorpe, vl, opurdila, netdev,
Linux Kernel Mailing List, Jens Axboe
On Thu, 1 Oct 2009, David Miller wrote:
>
> It depends upon our interpretation of how you intended the
> SPLICE_F_NONBLOCK flag to work when you added it way back
> when.
>
> Linus introduced SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
> (splice: add SPLICE_F_NONBLOCK flag )
>
> It doesn't make the splice itself necessarily nonblocking (because the
> actual file descriptors that are spliced from/to may block unless they
> have the O_NONBLOCK flag set), but it makes the splice pipe operations
> nonblocking.
>
> Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only
Ack. The original intent was for the flag to affect the buffering, not the
end points.
> splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK );
>
> to block on data coming from socket (if file is in blocking mode),
> and not block on pipe output (to avoid deadlock)
Yes. Sounds correct. Although the more I think about it, the more I
suspect that the whole NONBLOCK thing should probably have been two bits,
and simply been about "nonblocking input" vs "nonblocking output" (so that
you could control both sides on a call-by-call basis).
But I think that your patch is fundamentally correct with the semantics
as-is.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SPLICE_F_NONBLOCK semantics...
2009-10-01 22:21 ` Linus Torvalds
@ 2009-10-01 22:24 ` Linus Torvalds
2009-10-01 22:27 ` David Miller
1 sibling, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2009-10-01 22:24 UTC (permalink / raw)
To: David Miller
Cc: eric.dumazet, jgunthorpe, vl, opurdila, netdev,
Linux Kernel Mailing List, Jens Axboe
On Thu, 1 Oct 2009, Linus Torvalds wrote:
>
> Ack. The original intent was for the flag to affect the buffering, not the
> end points.
Side note, in case it wasn't clear: I added Jens to the cc, because in the
end my "original intent" probably doesn't matter all that much. I may have
set up the basic ideas and so on, but I never wrote any apps that use it,
and Jens has done most of the actual fixes since.
So give "my original intent" the weight (or rather, lack there-of) that it
deserves.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SPLICE_F_NONBLOCK semantics...
2009-10-01 22:21 ` Linus Torvalds
2009-10-01 22:24 ` Linus Torvalds
@ 2009-10-01 22:27 ` David Miller
2009-10-02 7:47 ` Jens Axboe
1 sibling, 1 reply; 6+ messages in thread
From: David Miller @ 2009-10-01 22:27 UTC (permalink / raw)
To: torvalds
Cc: eric.dumazet, jgunthorpe, vl, opurdila, netdev, linux-kernel,
jens.axboe
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 1 Oct 2009 15:21:44 -0700 (PDT)
> On Thu, 1 Oct 2009, David Miller wrote:
>>
>> It depends upon our interpretation of how you intended the
>> SPLICE_F_NONBLOCK flag to work when you added it way back
>> when.
>>
>> Linus introduced SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
>> (splice: add SPLICE_F_NONBLOCK flag )
>>
>> It doesn't make the splice itself necessarily nonblocking (because the
>> actual file descriptors that are spliced from/to may block unless they
>> have the O_NONBLOCK flag set), but it makes the splice pipe operations
>> nonblocking.
>>
>> Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only
>
> Ack. The original intent was for the flag to affect the buffering, not the
> end points.
Great, thanks for reviewing.
> Although the more I think about it, the more I suspect that the
> whole NONBLOCK thing should probably have been two bits, and simply
> been about "nonblocking input" vs "nonblocking output" (so that you
> could control both sides on a call-by-call basis).
I think we could still extend things in this way if we wanted to.
So if you specify the explicit input and/or output nonblock flag,
it takes precedence over the SPLICE_F_NONBLOCK thing.
Anyways, just an idea.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SPLICE_F_NONBLOCK semantics...
2009-10-01 22:27 ` David Miller
@ 2009-10-02 7:47 ` Jens Axboe
2009-10-02 16:45 ` David Miller
0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2009-10-02 7:47 UTC (permalink / raw)
To: David Miller
Cc: torvalds, eric.dumazet, jgunthorpe, vl, opurdila, netdev,
linux-kernel
On Thu, Oct 01 2009, David Miller wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Thu, 1 Oct 2009 15:21:44 -0700 (PDT)
>
> > On Thu, 1 Oct 2009, David Miller wrote:
> >>
> >> It depends upon our interpretation of how you intended the
> >> SPLICE_F_NONBLOCK flag to work when you added it way back
> >> when.
> >>
> >> Linus introduced SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
> >> (splice: add SPLICE_F_NONBLOCK flag )
> >>
> >> It doesn't make the splice itself necessarily nonblocking (because the
> >> actual file descriptors that are spliced from/to may block unless they
> >> have the O_NONBLOCK flag set), but it makes the splice pipe operations
> >> nonblocking.
> >>
> >> Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only
> >
> > Ack. The original intent was for the flag to affect the buffering, not the
> > end points.
>
> Great, thanks for reviewing.
>
> > Although the more I think about it, the more I suspect that the
> > whole NONBLOCK thing should probably have been two bits, and simply
> > been about "nonblocking input" vs "nonblocking output" (so that you
> > could control both sides on a call-by-call basis).
>
> I think we could still extend things in this way if we wanted to.
> So if you specify the explicit input and/or output nonblock flag,
> it takes precedence over the SPLICE_F_NONBLOCK thing.
Yes I agree, thank god for having a 'flags' parameter for the syscalls
:-). I'll make a note to add and test bidirectional nonblock hints.
The net patch looks fine and correct to me, feel free to add my acked-by
if you want.
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SPLICE_F_NONBLOCK semantics...
2009-10-02 7:47 ` Jens Axboe
@ 2009-10-02 16:45 ` David Miller
0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2009-10-02 16:45 UTC (permalink / raw)
To: jens.axboe
Cc: torvalds, eric.dumazet, jgunthorpe, vl, opurdila, netdev,
linux-kernel
From: Jens Axboe <jens.axboe@oracle.com>
Date: Fri, 2 Oct 2009 09:47:54 +0200
> The net patch looks fine and correct to me, feel free to add my acked-by
> if you want.
Thanks Jens.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-10-02 16:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-01 22:11 SPLICE_F_NONBLOCK semantics David Miller
2009-10-01 22:21 ` Linus Torvalds
2009-10-01 22:24 ` Linus Torvalds
2009-10-01 22:27 ` David Miller
2009-10-02 7:47 ` Jens Axboe
2009-10-02 16:45 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).