From: Willy Tarreau <w@1wt.eu>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: chetan loke <loke.chetan@gmail.com>,
Andreas Gruenbacher <agruen@linbit.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Herbert Xu <herbert@gondor.hengli.com.au>,
"David S. Miller" <davem@davemloft.net>
Subject: Re: [RFC] [TCP 0/3] Receive from socket into bio without copying
Date: Tue, 3 Jul 2012 02:02:02 +0200 [thread overview]
Message-ID: <20120703000202.GC11039@1wt.eu> (raw)
In-Reply-To: <1341265024.22621.464.camel@edumazet-glaptop>
Hi Eric,
On Mon, Jul 02, 2012 at 11:37:04PM +0200, Eric Dumazet wrote:
> On Mon, 2012-07-02 at 15:41 -0400, chetan loke wrote:
> > On Mon, Jul 2, 2012 at 12:06 PM, Andreas Gruenbacher <agruen@linbit.com> wrote:
> > > On Mon, 2012-07-02 at 15:54 +0200, Eric Dumazet wrote:
> > >> So I will just say no to your patches, unless you demonstrate the
> > >> splice() problems, and how you can fix the alignment problem in a new
> > >> layer instead of in the existing zero copy standard one.
> > >
> > > Again, splice or not is not the issue here. It does not, by itself, allow zero
> > > copy from the network directly to disk but it could likely be made to support
> > > that if we can get the alignment right first. The proposed MSG_NEW_PACKET flag
> > > helps with that, but maybe someone has a better idea.
> > >
> >
> > Eric - by using splice do you mean something like:
> >
> > int filedes[2];
> > PIPE_SIZE (64*1024)
> > pipe(filedes);
> > ret = splice (sock_fd_from, &from_offset, filedes [1], NULL, PIPE_SIZE,
> > SPLICE_F_MORE | SPLICE_F_MOVE);
> >
> >
> > ret = splice (filedes [0], NULL, file_fd_to,
> > &to_offset, ret,
> > SPLICE_F_MORE | SPLICE_F_MOVE);
> >
>
> Yes, thats more or less the plan. You also can play with bigger
> PIPE_SIZE if needed.
I confirm, this is recommended at high bit rates if you're working with
large windows.
> > i.e. splice-in from socket to pipe, and splice-out from pipe to destination?
> >
> > Andreas - if the above assumption is true then can you apply the
> > 'MSG_NEW_PACKET' on the sender and see if the above pseudo-splice code
> > achieves something similar to what you expect on the receive side(you
> > can also play w/ F_SETPIPE_SZ - although I found very little
> > reduction in CPU usage)? Note: My personal experience - using splice
> > from an input-file-A to output-file-B bought very minimal cpu
> > reduction(yes, both the files used O_DIRECT). Instead, a simple
> > read/write w/ O_DIRECT from file-A to file-B was much much faster.
>
> splice() performance from socket to pipe have improved a lot in
> linux-3.5
>
> It was not true zero copy, until very recent patches.
In fact it has been true zero copy in 2.6.25 until we faced a large
amount of data corruption and the zero copy was disabled in 2.6.25.X.
Since then it remained that way until you brought your patches to
re-instantiate it.
> (It was zero copy only on certain class of NIC, not on the ones found
> on appliances or cheap platforms)
>
> Willy Tarreau mentioned a nice boost of performance with haproxy.
Yes definitely. The savings are more noticeable on small systems where
memory bandwidth is limited. On a small ARM system bound by RAM bandwidth,
the performance was basically doubled. But I also observed nice savings
on a core2duo equipped with 2 myricom 10Gig NICs forwarding at line rate.
> Willy wanted to work on a direct splice from socket to socket, but
> I am not sure it'll bring major speed improvement.
I'm not sure at all either, I'm betting a few percent saved from the
reduction of syscalls, not much more. This is why I'll probably check
this when I have enough time to kill.
Regards,
Willy
next prev parent reply other threads:[~2012-07-03 0:02 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-29 14:53 [RFC] [TCP 0/3] Receive from socket into bio without copying Andreas Gruenbacher
2012-06-29 15:08 ` Eric Dumazet
2012-07-02 11:45 ` Andreas Gruenbacher
2012-07-02 12:36 ` Eric Dumazet
2012-07-02 13:02 ` Andreas Gruenbacher
2012-07-02 13:54 ` Eric Dumazet
2012-07-02 16:06 ` Andreas Gruenbacher
2012-07-02 19:41 ` chetan loke
2012-07-02 21:37 ` Eric Dumazet
2012-07-03 0:02 ` Willy Tarreau [this message]
2012-07-02 13:39 ` saeed bishara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120703000202.GC11039@1wt.eu \
--to=w@1wt.eu \
--cc=agruen@linbit.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=herbert@gondor.hengli.com.au \
--cc=linux-kernel@vger.kernel.org \
--cc=loke.chetan@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).