netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Benjamin LaHaise <bcrl@kvack.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Vijay Subramanian <subramanian.vijay@gmail.com>,
	David Miller <davem@davemloft.net>,
	saku@ytti.fi, rick.jones2@hp.com, netdev@vger.kernel.org
Subject: Re: TCP and reordering
Date: Wed, 28 Nov 2012 16:41:27 +0000	[thread overview]
Message-ID: <1354120887.21562.87.camel@shinybook.infradead.org> (raw)
In-Reply-To: <20121128161930.GB19042@kvack.org>

[-- Attachment #1: Type: text/plain, Size: 3676 bytes --]

On Wed, 2012-11-28 at 11:19 -0500, Benjamin LaHaise wrote:
> On Wed, Nov 28, 2012 at 03:47:15PM +0000, David Woodhouse wrote:
> > On Wed, 2012-11-28 at 04:52 -0800, Eric Dumazet wrote:
> > > BQL is nice for high speed adapters.
> > 
> > For adapters with hugely deep queues, surely? There's a massive
> > correlation between the two, of course ??? but PPP over L2TP or PPPoE
> > ought to be included in the classification, right?
> 
> Possibly, but there are many setups where PPPoE/L2TP do not connect to 
> the congested link directly.

Absolutely. But in the cases where they *do* connect to the congested
link, and the packets are backing up on the *same* host, there's no
excuse for not actually knowing that and behaving appropriately :)

And even if the congested link is somewhere upstream, you'd hope that
something exists (like ECN) to let you know about it.

In the LNS case that I'm most familiar with, the LNS *does* know about
the bandwidth of each customer's ADSL line, and limits the bandwidth of
each session appropriately. It's much better to decide at the LNS which
packets to drop, than to let the telco decide. Or worse, to have the
ADSL link drop one ATM cell out of *every* packet when it's
overloaded...

> This sort of chaining of destructors is going to be very expensive in 
> terms of CPU cycles.  If this does get implemented, please ensure there is 
> a way to turn it off.

You asked that before, and I think we agreed that it would be acceptable
to use the existing CONFIG_BQL option?

I'm looking at adding ppp-channel equivalents of
netdev_{reset,sent,completed}_queue, and having the PPP channels call
them as appropriate. For some it's trivial, but in the PPPoE/L2TP cases
because we want to install destructors without stomping on TSQ it'll be
substantial enough that it should be compiled out if CONFIG_BQL isn't
enabled.

> That said, if there is local congestion, the benefits of BQL would be 
> worthwhile to have.

If there is local congestion... *or* if you have proper bandwidth
management on the link to the clients; either by knowing the bandwidth
and voluntarily limiting to it, or by something like ECN.

> > But I wish there was a nicer way to chain destructors. And no, I don't
> > count what GSO does. We can't use the cb here anyway since we're passing
> > it down the stack.
> 
> I think all the tunneling protocols are going to have the same problem 
> here, so it deserves some thought about how to tackle the issue in a 
> generic way without incurring a large amount of overhead. 

Right. There are a few cases of skb->destructor being used at different
levels of the stack where I suspect this might already be an issue, in
fact. And things like TSQ will silently be losing track of packets
because of skb_orphan, even before they've left the box.

Hah, and I note that l2tp is *already* stomping on skb->destructor for
its own purposes. So I could potentially just use its existing callback
and pretend I hadn't seen that it screws up TSQ, and leave the issue of
chaining destructors to be Someone Else's Problem™.

Actually, I think it overwrites the destructor without calling
skb_orphan() first — which will *really* upset TSQ, won't it?

>  This exact 
> problem is one of the reasons multilink PPP often doesn't work well over 
> L2TP or PPPoE as compared to its behaviour over ttys.

Another fun issue with tunnelling protocols and BQL... packets tend to
*grow* as they get encapsulated. So you might end up calling
netdev_sent_queue() with a given size, then netdev_completed_queue()
with a bigger packet later...

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

  reply	other threads:[~2012-11-28 16:41 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-27  9:32 TCP and reordering Saku Ytti
2012-11-27 17:05 ` Rick Jones
2012-11-27 17:15   ` Saku Ytti
2012-11-27 18:00     ` Rick Jones
2012-11-28  2:06     ` David Miller
2012-11-28  7:26       ` Saku Ytti
2012-11-28  8:35         ` Vijay Subramanian
2012-11-28  8:54           ` Saku Ytti
2012-11-28 18:24             ` Rick Jones
2012-11-28 18:33               ` Eric Dumazet
2012-11-28 18:52                 ` Vijay Subramanian
2012-11-28 18:44               ` Saku Ytti
2012-11-28  7:59       ` David Woodhouse
2012-11-28  8:21         ` Christoph Paasch
2012-11-28  8:22         ` Vijay Subramanian
2012-11-28  9:08           ` David Woodhouse
2012-11-28 11:02             ` Eric Dumazet
2012-11-28 11:49               ` David Woodhouse
2012-11-28 12:26                 ` Eric Dumazet
2012-11-28 12:39                   ` David Woodhouse
2012-11-28 12:52                     ` Eric Dumazet
2012-11-28 15:47                       ` David Woodhouse
2012-11-28 16:09                         ` Eric Dumazet
2012-11-28 16:21                           ` David Woodhouse
2012-11-28 16:19                         ` Benjamin LaHaise
2012-11-28 16:41                           ` David Woodhouse [this message]
2012-11-28 17:08                             ` Benjamin LaHaise
2012-11-28 17:16                             ` Eric Dumazet
2012-11-28 18:01                               ` David Woodhouse
2012-11-29 15:10                                 ` Noel Grandin
2012-12-02  1:30                               ` David Woodhouse
2012-12-02  2:40                                 ` Eric Dumazet
2012-12-02  9:31                                   ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1354120887.21562.87.camel@shinybook.infradead.org \
    --to=dwmw2@infradead.org \
    --cc=bcrl@kvack.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hp.com \
    --cc=saku@ytti.fi \
    --cc=subramanian.vijay@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).