From: ebiederm@xmission.com (Eric W. Biederman)
To: Werner Almesberger <werner@almesberger.net>
Cc: Jeff Garzik <jgarzik@pobox.com>,
Nivedita Singhvi <niv@us.ibm.com>,
netdev@oss.sgi.com, linux-kernel@vger.kernel.org
Subject: Re: TOE brain dump
Date: 03 Aug 2003 13:21:09 -0600 [thread overview]
Message-ID: <m1fzkiwnru.fsf@frodo.biederman.org> (raw)
In-Reply-To: <20030802184901.G5798@almesberger.net>
Werner Almesberger <werner@almesberger.net> writes:
> Jeff Garzik wrote:
> > jabbering at the same time. TCP is a "one size fits all" solution, but
> > it doesn't work well for everyone.
>
> But then, ten "optimized xxPs" that work well in two different
> scenarios each, but not so good in the 98 others, wouldn't be
> much fun either.
The optimized for low latency cases seem to have a strong
market in clusters. And they are currently keeping alive
quite a few technologies. Myrinet, Infiniband, Quadric's Elan, etc.
Having low latency and switch technologies that scale is quite
rare currently.
> Another problem of TCP is that it has grown a bit too many
> knobs you need to turn before it works over your really fast
> really long pipe. (In one of the OLS after dinner speeches,
> this was quite appropriately called the "wizard gap".)
Does anyone know which knobs to turn to make TCP go fast over
Infiniband. (A low latency high bandwidth network?) I get to
deal with them on a regular basis...
There is one place in low latency communications that I can think
of where TCP/IP is not the proper solution. For low latency
communication the checksum is at the wrong end of the packet.
IB gets this one correct and places the checksum at the tail end of
the packet. This allows the packet to start transmitting before
the checksum is computed, possibly even having the receive start
at the other end before the tail of the packet is transmitted.
Would it make any sense to do a low latency variation on TCP that
fixes that problem? For the IP header we are fine as the data
precedes the checksum. But the problem appears to affect all
of the upper level protocols that ride on IP, UDP, TCP, SCTP...
> > So, fix the other end of the pipeline too, otherwise this fast network
> > stuff is flashly but pointless. If you want to serve up data from disk,
> > then start creating PCI cards that have both Serial ATA and ethernet
> > connectors on them :) Cut out the middleman of the host CPU and host
> > memory bus instead of offloading portions of TCP that do not need to be
> > offloaded.
>
> That's a good point. A hierarchical memory structure can help
> here. Moving one end closer to the hardware, and letting it
> know (e.g. through sendfile) that also the other end is close
> (or can be reached more directly that through some hopelessly
> crowded main bus) may help too.
On that score it is worth noting that the next generation of
peripheral busses (Hypertransport, PCI Express, etc) are all switched.
Which means that device to device communication may be more
reasonable. Going from a bussed interconnect to a switched
interconnect is certainly a dramatic change in infrastructure. How
that will affect the tradeoffs I don't know.
Eric
next prev parent reply other threads:[~2003-08-03 19:21 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-08-02 17:04 TOE brain dump Werner Almesberger
2003-08-02 17:32 ` Nivedita Singhvi
2003-08-02 18:06 ` Werner Almesberger
2003-08-02 19:08 ` Jeff Garzik
2003-08-02 21:49 ` Werner Almesberger
2003-08-03 6:40 ` Jeff Garzik
2003-08-03 17:57 ` Werner Almesberger
2003-08-03 18:27 ` Erik Andersen
2003-08-03 19:40 ` Larry McVoy
2003-08-03 20:13 ` David Lang
2003-08-03 20:30 ` Larry McVoy
2003-08-03 21:21 ` David Lang
2003-08-03 23:44 ` Larry McVoy
2003-08-03 21:58 ` Jeff Garzik
2003-08-05 19:28 ` Timothy Miller
2003-08-03 20:34 ` jamal
[not found] ` <3F2DBB2B.9050803@aarnet.edu.au>
2003-08-04 5:25 ` David S. Miller
2003-08-04 16:20 ` Web100 Matt Mathis
2003-08-06 7:12 ` TOE brain dump Andre Hedrick
[not found] ` <Pine.LNX.4.10.10308060009130.25045-100000@master.linux-ide .org>
2003-08-06 8:20 ` Lincoln Dale
2003-08-06 8:22 ` David S. Miller
2003-08-06 13:07 ` Jesse Pollard
2003-08-03 19:21 ` Eric W. Biederman [this message]
2003-08-04 19:24 ` Werner Almesberger
2003-08-04 19:26 ` David S. Miller
2003-08-05 17:25 ` Eric W. Biederman
2003-08-05 17:19 ` Eric W. Biederman
2003-08-06 5:13 ` Werner Almesberger
2003-08-06 7:58 ` Eric W. Biederman
2003-08-06 13:37 ` Werner Almesberger
2003-08-06 12:46 ` Jesse Pollard
2003-08-06 16:25 ` Andy Isaacson
2003-08-06 18:58 ` Jesse Pollard
2003-08-06 19:39 ` Andy Isaacson
2003-08-06 21:13 ` David Schwartz
2003-08-03 4:01 ` Ben Greear
2003-08-03 6:22 ` Alan Shih
2003-08-03 6:41 ` Jeff Garzik
2003-08-03 8:25 ` David Lang
2003-08-03 18:05 ` Werner Almesberger
2003-08-03 22:02 ` Alan Shih
2003-08-03 20:52 ` Alan Cox
2003-08-04 14:36 ` Ingo Oeser
2003-08-04 17:19 ` Alan Shih
2003-08-05 8:15 ` Ingo Oeser
2003-08-02 20:57 ` Alan Cox
2003-08-02 22:14 ` Werner Almesberger
2003-08-03 20:51 ` Alan Cox
-- strict thread matches above, loose matches on Subject: below --
2003-08-04 16:45 jamal
2003-08-04 18:48 ` Ihar 'Philips' Filipau
2003-08-04 19:42 ` jamal
2003-08-04 20:06 ` Ihar 'Philips' Filipau
2003-08-04 18:36 Perez-Gonzalez, Inaky
2003-08-04 19:03 ` Alan Cox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1fzkiwnru.fsf@frodo.biederman.org \
--to=ebiederm@xmission.com \
--cc=jgarzik@pobox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@oss.sgi.com \
--cc=niv@us.ibm.com \
--cc=werner@almesberger.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).