netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Garzik <jgarzik@pobox.com>
To: netdev@oss.sgi.com, linux-kernel@vger.kernel.org
Cc: Werner Almesberger <werner@almesberger.net>,
	Nivedita Singhvi <niv@us.ibm.com>
Subject: Re: TOE brain dump
Date: Sun, 03 Aug 2003 02:40:33 -0400	[thread overview]
Message-ID: <3F2CAE61.7070401@pobox.com> (raw)
In-Reply-To: <20030802184901.G5798@almesberger.net>

Werner Almesberger wrote:
> Jeff Garzik wrote:
> 
>>jabbering at the same time.  TCP is a "one size fits all" solution, but 
>>it doesn't work well for everyone.
> 
> 
> But then, ten "optimized xxPs" that work well in two different
> scenarios each, but not so good in the 98 others, wouldn't be
> much fun either.
> 
> It's been tried a number of times. Usually, real life sneaks
> in at one point or another, leaving behind a complex mess.
> When they've sorted out these problems, regular TCP has caught
> up with the great optimized transport protocols. At that point,
> they return to their niche, sometimes tail between legs and
> muttering curses, sometimes shaking their fist and boldly
> proclaiming how badly they'll rub TCP in the dirt in the next
> round. Maybe they shed off some of the complexity, and trade it
> for even more aggressive optimization, which puts them into
> their niche even more firmly. Eventually, they fade away.
> 
> There are cases where TCP doesn't work well, like a path of
> badly mismatched link layers, but such paths don't treat any
> protocol following the end-to-end principle kindly.
> 
> Another problem of TCP is that it has grown a bit too many
> knobs you need to turn before it works over your really fast
> really long pipe. (In one of the OLS after dinner speeches,
> this was quite appropriately called the "wizard gap".)
> 
> 
>>It's obviously not over a WAN...
> 
> 
> That's why NFS turned off UDP checksums ;-) As soon as you put
> it on IP, it will crawl to distances you didn't imagine in your
> wildest dreams. It always does.

Really fast, really long pipes in practice don't exist for 99.9% of all 
Internet users.


When you approach traffic levels that push you want to offload most of 
the TCP net stack, then TCP isn't the right solution for you anymore, 
all things considered.


The Linux net stack just isn't built to be offloaded.  TOE engines will 
either need to (1) fall back to Linux software for all-but-the-common 
case (otherwise netfilter, etc. break), or, (2) will need to be 
hideously complex beasts themselves.  And I can't see ASIC and firmware 
designers being excited about implementing netfilter on a PCI card :)

Unfortunately some vendors seem to choosing TOE option #3:  TCP offload 
which introduces many limitations (connection limits, netfilter not 
supported, etc.) which Linux never had before.  Vendors don't seem to 
realize TOE has real potential to damage the "good network neighbor" 
image the net stack has.  The Linux net stack's behavior is known, 
documented, predictable.  TOE changes all that.

There is one interesting TOE solution, that I have yet to see created: 
run Linux on an embedded processor, on the NIC.  This stripped-down 
Linux kernel would perform all the header parsing, checksumming, etc. 
into the NIC's local RAM.  The Linux OS driver interface becomes a 
virtual interface with a large MTU, that communicates from host CPU to 
NIC across the PCI bus using jumbo-ethernet-like data frames. 
Management frames would control the ethernet interface on the other side 
of the PCI bus "tunnel".


>>So, fix the other end of the pipeline too, otherwise this fast network 
>>stuff is flashly but pointless.  If you want to serve up data from disk, 
>>then start creating PCI cards that have both Serial ATA and ethernet 
>>connectors on them :)  Cut out the middleman of the host CPU and host 
>>memory bus instead of offloading portions of TCP that do not need to be 
>>offloaded.
> 
> 
> That's a good point. A hierarchical memory structure can help
> here. Moving one end closer to the hardware, and letting it
> know (e.g. through sendfile) that also the other end is close
> (or can be reached more directly that through some hopelessly
> crowded main bus) may help too.

Definitely.

	Jeff

  reply	other threads:[~2003-08-03  6:40 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-02 17:04 TOE brain dump Werner Almesberger
2003-08-02 17:32 ` Nivedita Singhvi
2003-08-02 18:06   ` Werner Almesberger
2003-08-02 19:08   ` Jeff Garzik
2003-08-02 21:49     ` Werner Almesberger
2003-08-03  6:40       ` Jeff Garzik [this message]
2003-08-03 17:57         ` Werner Almesberger
2003-08-03 18:27           ` Erik Andersen
2003-08-03 19:40             ` Larry McVoy
2003-08-03 20:13               ` David Lang
2003-08-03 20:30                 ` Larry McVoy
2003-08-03 21:21                   ` David Lang
2003-08-03 23:44                     ` Larry McVoy
2003-08-03 21:58                   ` Jeff Garzik
2003-08-05 19:28                   ` Timothy Miller
2003-08-03 20:34               ` jamal
     [not found]         ` <3F2DBB2B.9050803@aarnet.edu.au>
2003-08-04  5:25           ` David S. Miller
2003-08-04 16:20             ` Web100 Matt Mathis
2003-08-06  7:12         ` TOE brain dump Andre Hedrick
     [not found]         ` <Pine.LNX.4.10.10308060009130.25045-100000@master.linux-ide .org>
2003-08-06  8:20           ` Lincoln Dale
2003-08-06  8:22             ` David S. Miller
2003-08-06 13:07               ` Jesse Pollard
2003-08-03 19:21       ` Eric W. Biederman
2003-08-04 19:24         ` Werner Almesberger
2003-08-04 19:26           ` David S. Miller
2003-08-05 17:25             ` Eric W. Biederman
2003-08-05 17:19           ` Eric W. Biederman
2003-08-06  5:13             ` Werner Almesberger
2003-08-06  7:58               ` Eric W. Biederman
2003-08-06 13:37                 ` Werner Almesberger
2003-08-06 12:46             ` Jesse Pollard
2003-08-06 16:25               ` Andy Isaacson
2003-08-06 18:58                 ` Jesse Pollard
2003-08-06 19:39                   ` Andy Isaacson
2003-08-06 21:13                     ` David Schwartz
2003-08-03  4:01     ` Ben Greear
2003-08-03  6:22       ` Alan Shih
2003-08-03  6:41         ` Jeff Garzik
2003-08-03  8:25         ` David Lang
2003-08-03 18:05           ` Werner Almesberger
2003-08-03 22:02           ` Alan Shih
2003-08-03 20:52       ` Alan Cox
2003-08-04 14:36     ` Ingo Oeser
2003-08-04 17:19       ` Alan Shih
2003-08-05  8:15         ` Ingo Oeser
2003-08-02 20:57 ` Alan Cox
2003-08-02 22:14   ` Werner Almesberger
2003-08-03 20:51     ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2003-08-04 16:45 jamal
2003-08-04 18:48 ` Ihar 'Philips' Filipau
2003-08-04 19:42   ` jamal
2003-08-04 20:06     ` Ihar 'Philips' Filipau
2003-08-04 18:36 Perez-Gonzalez, Inaky
2003-08-04 19:03 ` Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F2CAE61.7070401@pobox.com \
    --to=jgarzik@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@oss.sgi.com \
    --cc=niv@us.ibm.com \
    --cc=werner@almesberger.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).