Re: TCP and reordering - Benjamin LaHaise

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Benjamin LaHaise <bcrl@kvack.org>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Vijay Subramanian <subramanian.vijay@gmail.com>,
	David Miller <davem@davemloft.net>,
	saku@ytti.fi, rick.jones2@hp.com, netdev@vger.kernel.org
Subject: Re: TCP and reordering
Date: Wed, 28 Nov 2012 12:08:24 -0500	[thread overview]
Message-ID: <20121128170824.GC19042@kvack.org> (raw)
In-Reply-To: <1354120887.21562.87.camel@shinybook.infradead.org>

On Wed, Nov 28, 2012 at 04:41:27PM +0000, David Woodhouse wrote:
> Absolutely. But in the cases where they *do* connect to the congested
> link, and the packets are backing up on the *same* host, there's no
> excuse for not actually knowing that and behaving appropriately :)

Agreed.

> And even if the congested link is somewhere upstream, you'd hope that
> something exists (like ECN) to let you know about it.
> 
> In the LNS case that I'm most familiar with, the LNS *does* know about
> the bandwidth of each customer's ADSL line, and limits the bandwidth of
> each session appropriately. It's much better to decide at the LNS which
> packets to drop, than to let the telco decide. Or worse, to have the
> ADSL link drop one ATM cell out of *every* packet when it's
> overloaded...

I'm speaking from experience: the telcos I've dealt with (2 different 
companies here in Canada) do *not* know the speed of the ADSL link being 
fed with PPPoE at the customer premises that a LAC receives as an incoming 
session.  The issue is that the aggregation network does not propagate 
that information from the DSLAM to the LAC.  It's a big mess where the 
aggregation network has a mix of ATM and L2 ethernet switches, and much 
of the gear has no support for protocols that can carry that information.

> > This sort of chaining of destructors is going to be very expensive in 
> > terms of CPU cycles.  If this does get implemented, please ensure there is 
> > a way to turn it off.
> 
> You asked that before, and I think we agreed that it would be acceptable
> to use the existing CONFIG_BQL option?

No, that would not be sufficient, as otherwise there is no means to control 
the behaviour of distribution vendor kernels -- they would most likely 
default to on.

> I'm looking at adding ppp-channel equivalents of
> netdev_{reset,sent,completed}_queue, and having the PPP channels call
> them as appropriate. For some it's trivial, but in the PPPoE/L2TP cases
> because we want to install destructors without stomping on TSQ it'll be
> substantial enough that it should be compiled out if CONFIG_BQL isn't
> enabled.

This sounds like overhead.  That said, I'd like to measure it to see what 
sort of actual effect this has on performance before passing any judgement.  
I'd be happy to put together a test setup to run anything you've come up 
with through.

> > That said, if there is local congestion, the benefits of BQL would be 
> > worthwhile to have.
> 
> If there is local congestion... *or* if you have proper bandwidth
> management on the link to the clients; either by knowing the bandwidth
> and voluntarily limiting to it, or by something like ECN.

Improved ECN support is a very good idea.

> > > But I wish there was a nicer way to chain destructors. And no, I don't
> > > count what GSO does. We can't use the cb here anyway since we're passing
> > > it down the stack.
> > 
> > I think all the tunneling protocols are going to have the same problem 
> > here, so it deserves some thought about how to tackle the issue in a 
> > generic way without incurring a large amount of overhead. 
> 
> Right. There are a few cases of skb->destructor being used at different
> levels of the stack where I suspect this might already be an issue, in
> fact. And things like TSQ will silently be losing track of packets
> because of skb_orphan, even before they've left the box.
> 
> Hah, and I note that l2tp is *already* stomping on skb->destructor for
> its own purposes. So I could potentially just use its existing callback
> and pretend I hadn't seen that it screws up TSQ, and leave the issue of
> chaining destructors to be Someone Else's Problem???.

*nod*

> Actually, I think it overwrites the destructor without calling
> skb_orphan() first ??? which will *really* upset TSQ, won't it?

Yes, that would defeat things.

> >  This exact 
> > problem is one of the reasons multilink PPP often doesn't work well over 
> > L2TP or PPPoE as compared to its behaviour over ttys.
> 
> Another fun issue with tunnelling protocols and BQL... packets tend to
> *grow* as they get encapsulated. So you might end up calling
> netdev_sent_queue() with a given size, then netdev_completed_queue()
> with a bigger packet later...

Oh fun!

Ultimately, we want to know about congestion as early as possible in the 
packet processing.  In the case of L2TP, it would be helpful to use the 
knowledge of the path the packet will be sent out on to correclty set the 
ECN bits on the packet inside the L2TP encapsulation.  The L2TP code does 
not appear to do this at present, so this needs work.

		-ben

> -- 
> dwmw2
> 



-- 
"Thought is the essence of where you are now."

next prev parent reply	other threads:[~2012-11-28 17:08 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-27  9:32 TCP and reordering Saku Ytti
2012-11-27 17:05 ` Rick Jones
2012-11-27 17:15   ` Saku Ytti
2012-11-27 18:00     ` Rick Jones
2012-11-28  2:06     ` David Miller
2012-11-28  7:26       ` Saku Ytti
2012-11-28  8:35         ` Vijay Subramanian
2012-11-28  8:54           ` Saku Ytti
2012-11-28 18:24             ` Rick Jones
2012-11-28 18:33               ` Eric Dumazet
2012-11-28 18:52                 ` Vijay Subramanian
2012-11-28 18:44               ` Saku Ytti
2012-11-28  7:59       ` David Woodhouse
2012-11-28  8:21         ` Christoph Paasch
2012-11-28  8:22         ` Vijay Subramanian
2012-11-28  9:08           ` David Woodhouse
2012-11-28 11:02             ` Eric Dumazet
2012-11-28 11:49               ` David Woodhouse
2012-11-28 12:26                 ` Eric Dumazet
2012-11-28 12:39                   ` David Woodhouse
2012-11-28 12:52                     ` Eric Dumazet
2012-11-28 15:47                       ` David Woodhouse
2012-11-28 16:09                         ` Eric Dumazet
2012-11-28 16:21                           ` David Woodhouse
2012-11-28 16:19                         ` Benjamin LaHaise
2012-11-28 16:41                           ` David Woodhouse
2012-11-28 17:08                             ` Benjamin LaHaise [this message]
2012-11-28 17:16                             ` Eric Dumazet
2012-11-28 18:01                               ` David Woodhouse
2012-11-29 15:10                                 ` Noel Grandin
2012-12-02  1:30                               ` David Woodhouse
2012-12-02  2:40                                 ` Eric Dumazet
2012-12-02  9:31                                   ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121128170824.GC19042@kvack.org \
    --to=bcrl@kvack.org \
    --cc=davem@davemloft.net \
    --cc=dwmw2@infradead.org \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hp.com \
    --cc=saku@ytti.fi \
    --cc=subramanian.vijay@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).