netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "David S. Miller" <davem@davemloft.net>
To: herbert@gondor.apana.org.au
Cc: netdev@oss.sgi.com
Subject: Re: issue with new TCP TSO stuff
Date: Thu, 12 May 2005 13:13:49 -0700 (PDT)	[thread overview]
Message-ID: <20050512.131349.32715242.davem@davemloft.net> (raw)
In-Reply-To: <E1DWAZg-0006aD-00@gondolin.me.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Subject: Re: issue with new TCP TSO stuff
Date: Thu, 12 May 2005 20:05:48 +1000

> What we could do is get the TSO drivers to all implement NETIF_F_FRAGLIST.
> Once they do that, you can simply chain up the skb's and send it off to
> them.  The coalescing will need to be done in the drivers.  However, that's
> not too bad because coalescing only has to be done at the skb boundaries.
> 
> In fact, this is how we can simplify the unwinding stuff in your
> skb_append_pages function.  Because the coalescing only needs to occur
> between skb's, you only need to check the first frag to know whether it
> can be coalesced or not.  This means that the unwinding stuff can mostly
> go away.
> 
> We'll have to watch out for retransmissions of the frame with a non-null
> frag_list pointer.  They will need to be copied if their clone is still
> hanging around.

Yes, we can just add a frag_list pointer check to the skb_cloned()
tests we do when cons'ing up retransmit SKBs for tcp_transmit_skb().

But this still has the early free problem, I think.  If an ACK
comes in which releases an SKB on the chain, while the driver
is still working with that chain, we cannot free the SKB.  We
have to do it some time later.

One way to prevent that would be to do an skb_get() on every
SKB in the chain, but then we're back to the original problem
of all the extra atomic operations.

A secondary point is that I'd like to use a name other than
NETIF_F_FRAGLIST because people are super confused as to what this
device flag even means.  Some people confuse it with NETIF_F_SG,
others thing it takes a huge UDP frame and fragments it into MTU sized
IP frames and checksums the whole thing.  None of which are true.

Loopback is the only driver which supports this properly, by
simply doing nothing with the packet :-)

So back to the main point, we are in quite a conundrum.  The whole
point of TSO is to offload the segmentation overhead, but we're in
fact making the TCP output engine more expensive for the TSO path.

I've also considered a longer term idea where we store the write queue
in some minimal abstract format, instead of a list of SKBs.  Just a
data collection and some sequence numbers.  But that would be a huge
change with questionable gains.

  reply	other threads:[~2005-05-12 20:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-12  5:30 issue with new TCP TSO stuff David S. Miller
2005-05-12 10:05 ` Herbert Xu
2005-05-12 20:13   ` David S. Miller [this message]
2005-05-12 21:47     ` Herbert Xu
2005-05-12 22:10       ` Herbert Xu
2005-05-12 22:52         ` David S. Miller
2005-05-12 23:10           ` Herbert Xu
2005-05-12 23:24             ` David S. Miller
2005-05-12 23:52               ` Herbert Xu
2005-05-13  4:36                 ` David S. Miller
2005-05-13 13:25                   ` Herbert Xu
2005-05-12 22:46       ` David S. Miller
2005-05-12 14:13 ` Andi Kleen
2005-05-12 19:26   ` David S. Miller
     [not found]     ` <20050512200251.GA72662@muc.de>
2005-05-12 20:03       ` David S. Miller
2005-05-12 20:26         ` Andi Kleen
2005-05-12 22:34           ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050512.131349.32715242.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).