From: Octavian Purdila <opurdila@ixiacom.com>
To: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH] tcp: do not promote SPLICE_F_NONBLOCK to socket O_NONBLOCK
Date: Thu, 17 Jul 2008 16:33:49 +0300 [thread overview]
Message-ID: <200807171633.49791.opurdila@ixiacom.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 1 bytes --]
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 2185 bytes --]
commit 11134aa8499b6fd67569e8fd21bde6fc481898d1
Author: Octavian Purdila <opurdila@ixiacom.com>
Date: Thu Jul 17 16:25:23 2008 +0300
tcp: do not promote SPLICE_F_NONBLOCK to socket O_NONBLOCK
This patch changes tcp_splice_read to the behavior implied by man 2
splice:
SPLICE_F_NONBLOCK - Do not block on I/O. This makes the splice
pipe operations non-blocking, but splice() may nevertheless block
because the file descriptors that are spliced to/from may block
(unless they have the O_NONBLOCK flag set).
This approach also provides a simple solution to the splice
transfer size problem. Say we have the following common sequence:
splice(socket, pipe);
splice(pipe, file);
Unless we specify SPLICE_F_NONBLOCK, we can't use arbitrarily large
transfer sizes with the 1st splice since otherwise we will deadlock
due to pipe being full. But if we use SPLICE_F_NONBLOCK, the current
implementation will make the underlying socket non-blocking and thus
will force us use poll or other async I/O notification mechanism.
Choosing a splice transfer size that won't deadlock is not trivial: we
need to stay under PIPE_BUFFERS packets and since packets can have
arbitrary sizes we will need to be conservative and use a small
transfer size. That can degrade performance due to excessive system
calls.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 56a133c..cc5082b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -570,7 +570,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
lock_sock(sk);
- timeo = sock_rcvtimeo(sk, flags & SPLICE_F_NONBLOCK);
+ timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
while (tss.len) {
ret = __tcp_splice_read(sk, &tss);
if (ret < 0)
@@ -578,10 +578,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
else if (!ret) {
if (spliced)
break;
- if (flags & SPLICE_F_NONBLOCK) {
- ret = -EAGAIN;
- break;
- }
if (sock_flag(sk, SOCK_DONE))
break;
if (sk->sk_err) {
next reply other threads:[~2008-07-17 13:36 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-17 13:33 Octavian Purdila [this message]
2008-07-17 14:21 ` [PATCH] tcp: do not promote SPLICE_F_NONBLOCK to socket O_NONBLOCK Evgeniy Polyakov
2008-07-17 14:47 ` Octavian Purdila
2008-07-17 17:41 ` Evgeniy Polyakov
2008-07-17 21:52 ` Octavian Purdila
2008-07-18 10:53 ` Evgeniy Polyakov
2008-07-18 11:18 ` Octavian Purdila
2008-07-18 12:24 ` Evgeniy Polyakov
2008-07-18 14:04 ` Octavian Purdila
2008-07-18 14:32 ` Evgeniy Polyakov
2008-07-18 15:50 ` Octavian Purdila
2008-07-18 16:00 ` Evgeniy Polyakov
2008-07-18 17:04 ` Octavian Purdila
2008-07-18 17:53 ` Evgeniy Polyakov
2008-07-18 18:16 ` Octavian Purdila
2008-07-18 18:35 ` Evgeniy Polyakov
2008-07-18 18:43 ` Octavian Purdila
2008-07-19 8:51 ` Evgeniy Polyakov
2008-07-19 11:18 ` Octavian Purdila
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200807171633.49791.opurdila@ixiacom.com \
--to=opurdila@ixiacom.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.