netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Evgeniy Polyakov <zbr@ioremap.net>,
	Jens Axboe <jens.axboe@oracle.com>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: Data corruption issue with splice() on 2.6.27.10
Date: Mon, 19 Jan 2009 10:53:21 +0100	[thread overview]
Message-ID: <20090119095320.GC14531@1wt.eu> (raw)
In-Reply-To: <20090119083921.GA17124@xi.wantstofly.org>

On Mon, Jan 19, 2009 at 09:39:21AM +0100, Lennert Buytenhek wrote:
> On Tue, Jan 06, 2009 at 07:50:12PM +0100, Willy Tarreau wrote:
> 
> > > Thanks a lot for the test application, it will greatly help to resolve
> > > this issue.
> > 
> > I figured it was an absolute necessity. The original code in my proxy is in
> > an experimental state and far too hard to debug for these purposes. It was
> > enough to detect the problem, but I could run a lot more tests with this
> > small test app ! An who know, maybe it will serve as an example for
> > non-blocking splice ;-)
> 
> :-)
> 
> Just to throw some more (hacky) example code into the pool, below is
> an echo server that I hacked up to test nonblocking splice().  (You'll
> need sf.net/projects/libivykis to use it.)  I also have a splice()
> discard server and a patch to my intercept-connection-via-iptables-and-
> forward-it-to-a-remote-SOCKS5-server-to-deal-with-crappy-VPNs app to
> use splice() somewhere.
> 
> My main annoyances with splice(2) are/were:
> 
> 1. -EAGAIN return on splice from socket/pipe to socket/pipe doesn't
>    directly tell you whether the source ran out of data or the
>    destination can't accept more data, which means you can't e.g. use
>    epoll in edge triggered mode without jumping through some minor
>    number of extra hoops.  (For a pipe you can keep track of how many
>    bytes are in it by hand, but for a socket->pipe splice -EAGAIN return
>    you'll have to assume that the pipe is full if there are >0 bytes in
>    it.)

I proceeded the same way : if EAGAIN and data still in the pipe, then
stop polling.

> 2. Because of (1), and because when splicing from a socket to a pipe
>    it returns after the first bit of data (you mentioned this as well),
>    you don't know at that point whether your pipe is full or not.

In fact this is fixed now. tcp_splice_read() returns all available data,
which somewhat hides problem #1. I'm running with 23 kB in a push/pull
method all the time, so it remains optimal.

> 3. Always returns -EAGAIN even if there was a FIN or error on the
>    source socket.  (Now fixed.)

Yes I saw your fix, it was indeed very annoying because the only workaround
I found was to perform an recv(MSG_PEEK) on the socket after each EAGAIN
to check whether the connection was closed or not.

For these reasons, I'd really love to see the few recent fixes backported
to -stable ASAP. It will boost splice() adoption among products.

Regards,
Willy


      reply	other threads:[~2009-01-19  9:53 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-24 15:28 Data corruption issue with splice() on 2.6.27.10 Willy Tarreau
2009-01-06  8:54 ` Jarek Poplawski
2009-01-06  9:41   ` Willy Tarreau
2009-01-06 10:01     ` Jarek Poplawski
2009-01-06 10:04       ` Willy Tarreau
2009-01-06 15:57       ` Willy Tarreau
2009-01-07  9:39         ` Jarek Poplawski
2009-01-07 12:22           ` Willy Tarreau
2009-01-07 12:24             ` Herbert Xu
2009-01-07 12:38               ` Jarek Poplawski
2009-01-07 12:31             ` Jarek Poplawski
2009-01-07 12:35               ` Jens Axboe
2009-01-07 12:40                 ` Evgeniy Polyakov
2009-01-07 12:52                   ` Willy Tarreau
2009-01-07 12:53                     ` Herbert Xu
2009-01-07 12:57                       ` Evgeniy Polyakov
2009-01-07 13:08                         ` Willy Tarreau
2009-01-07 12:49                 ` Jarek Poplawski
2009-01-07 12:52                   ` Herbert Xu
2009-01-07 13:00                     ` Willy Tarreau
2009-01-07 13:01                       ` Herbert Xu
2009-01-07 13:02                     ` Jarek Poplawski
2009-01-12 12:02                     ` Herbert Xu
2009-01-12 12:45                       ` Evgeniy Polyakov
2009-01-12 12:56                         ` Herbert Xu
2009-01-12 12:59                           ` Evgeniy Polyakov
2009-01-12 21:11                             ` Herbert Xu
2009-01-12 13:15                       ` Jarek Poplawski
2009-01-12 21:12                         ` Herbert Xu
2009-01-19  7:32                         ` Jarek Poplawski
2009-01-07 12:39               ` Willy Tarreau
2009-01-07 12:56                 ` Jarek Poplawski
2009-01-07 12:44         ` Herbert Xu
2009-01-06 17:42 ` Ben Mansell
2009-01-06 18:15   ` Willy Tarreau
2009-01-08  7:16     ` Jarek Poplawski
2009-01-08  8:05       ` Willy Tarreau
2009-01-08 14:53         ` Ingo Molnar
2009-01-08 15:16           ` Ben Mansell
2009-01-08 17:14           ` Willy Tarreau
2009-01-06 18:32 ` Evgeniy Polyakov
2009-01-06 18:37   ` Jens Axboe
2009-01-06 18:55     ` Willy Tarreau
2009-01-07  4:42     ` Herbert Xu
2009-01-07  6:38       ` Willy Tarreau
2009-01-07  9:52         ` Herbert Xu
2009-01-07  9:54           ` Willy Tarreau
2009-01-07 11:52             ` Herbert Xu
2009-01-07  8:17       ` Jens Axboe
2009-01-07 11:29       ` Evgeniy Polyakov
2009-01-07 11:50         ` Herbert Xu
2009-01-07 11:56           ` Evgeniy Polyakov
2009-01-07 11:59             ` Herbert Xu
2009-01-07 12:15               ` Evgeniy Polyakov
2009-01-07 12:22                 ` Herbert Xu
2009-01-07 12:27                   ` Herbert Xu
2009-01-07 12:30                     ` Herbert Xu
2009-01-07 12:37                   ` Evgeniy Polyakov
2009-01-07 12:42                     ` Herbert Xu
2009-01-07 12:46                       ` Evgeniy Polyakov
2009-01-07 12:55                         ` Willy Tarreau
2009-01-07 12:57                           ` Herbert Xu
2009-01-07 13:02                             ` Evgeniy Polyakov
2009-01-07 13:10                               ` Jarek Poplawski
2009-01-07 13:15                                 ` Willy Tarreau
2009-01-07 13:22                                   ` Jarek Poplawski
2009-01-07 14:01                                     ` Jarek Poplawski
2009-01-06 18:50   ` Willy Tarreau
2009-01-19  8:39     ` Lennert Buytenhek
2009-01-19  9:53       ` Willy Tarreau [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090119095320.GC14531@1wt.eu \
    --to=w@1wt.eu \
    --cc=buytenh@wantstofly.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).