netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruen@linbit.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [RFC] [TCP 0/3] Receive from socket into bio without copying
Date: Mon, 02 Jul 2012 18:06:31 +0200	[thread overview]
Message-ID: <1341245191.2177.40.camel@schurl.lan> (raw)
In-Reply-To: <1341237299.22621.88.camel@edumazet-glaptop>

On Mon, 2012-07-02 at 15:54 +0200, Eric Dumazet wrote:
> On Mon, 2012-07-02 at 15:02 +0200, Andreas Gruenbacher wrote:
> > bio_vec's have some alignment requirements that must be met, and
> > anything that doesn't meet those requirements can't be passed to the
> > block layer (without copying it first). Additional layers between
> > the
> > network and block layers, like a pipe, won't make that problem go
> > away.
> >
> 
> What are the "some alignment requirements" exactly, and how do you use
> TCP exactly to meet them ? (MSS= multiple of 512 ?)

Sectors of 512 bytes must be contiguous; some devices have additional
requirements (like 4k sectors).  I'm not sure if sectors always need to be
aligned, but if buffers are allocated page wise and handed out as half /
full pages, you get that automatically.

> I believe you try to escape from the real problem.
> 
> If the NIC driver provides non aligned data, neither splice() or your
> new stuff will magically align it. You _need_ a copy in either cases.

Yes, the NIC must provide aligned data.  A prerequisite for that is that the
NIC knows how to align things.  With no knowledge of the application protocol,
the NIC can only use the packet boundaries as hints.  I'm trying to get tcp
to start new packets at specific points in the protocol so that the packet
boundaries will coincide with alignment boundaries.  With that, NICs that do
header splitting can receive packets into appropriately aligned buffers, and
the problem is solved.

> If NIC driver provides aligned data, splice(socket -> pipe) will keep
> this alignment for you at 0 cost.

Yes of course.  That is not the real issue here though.

> > It's not already there, it requires the alignment issue to be
> > addresses first.
> 
> There is no guarantee TCP payload is aligned to a bio, ever, in linux
> ethernet/ip/tcp stack.
> 
> Really, your patches work for you, by pure luck, because you use one
> particular NIC driver that happens to prepare things for you
> (presumably doing header split). Nothing guarantee this wont change even
> for the same hardware in linux-3.8

NICs with header splitting are common enough that you don't have to resort
to pure luck to get one.

> So I will just say no to your patches, unless you demonstrate the
> splice() problems, and how you can fix the alignment problem in a new
> layer instead of in the existing zero copy standard one.

Again, splice or not is not the issue here. It does not, by itself, allow zero
copy from the network directly to disk but it could likely be made to support
that if we can get the alignment right first.  The proposed MSG_NEW_PACKET flag
helps with that, but maybe someone has a better idea.

This doesn't have to work with arbitrary NICs and it most likely will hurt with
small MTUs, but then you can still choose not to use it.  It just has to almost
always work with some particular NICs and with large MTUs.

Andreas

  reply	other threads:[~2012-07-02 16:06 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-29 14:53 [RFC] [TCP 0/3] Receive from socket into bio without copying Andreas Gruenbacher
2012-06-29 15:08 ` Eric Dumazet
2012-07-02 11:45   ` Andreas Gruenbacher
2012-07-02 12:36     ` Eric Dumazet
2012-07-02 13:02       ` Andreas Gruenbacher
2012-07-02 13:54         ` Eric Dumazet
2012-07-02 16:06           ` Andreas Gruenbacher [this message]
2012-07-02 19:41             ` chetan loke
2012-07-02 21:37               ` Eric Dumazet
2012-07-03  0:02                 ` Willy Tarreau
2012-07-02 13:39     ` saeed bishara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1341245191.2177.40.camel@schurl.lan \
    --to=agruen@linbit.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).