netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "David S. Miller" <davem@davemloft.net>
To: mst@mellanox.co.il
Cc: netdev@vger.kernel.org, rdreier@cisco.com, rick.jones2@hp.com,
	linux-kernel@vger.kernel.org, openib-general@openib.org
Subject: Re: TSO and IPoIB performance degradation
Date: Mon, 20 Mar 2006 02:37:04 -0800 (PST)	[thread overview]
Message-ID: <20060320.023704.70907203.davem@davemloft.net> (raw)
In-Reply-To: <20060320102234.GV29929@mellanox.co.il>

From: "Michael S. Tsirkin" <mst@mellanox.co.il>
Date: Mon, 20 Mar 2006 12:22:34 +0200

> Quoting r. David S. Miller <davem@davemloft.net>:
> > The path an SKB can take is opaque and unknown until the very last
> > moment it is actually given to the device transmit function.
> 
> Why, I was proposing looking at dst cache. If that's NULL, well,
> we won't stretch ACKs. Worst case we apply the wrong optimization.
> Right?

Where you receive a packet from isn't very useful for determining
even the full patch on which that packet itself flowed.

More importantly, packets also do not necessarily go back out over the
same path on which packets are received for a connection.  This is
actually quite common.

Maybe packets for this connection come in via IPoIB but go out via
gigabit ethernet and another route altogether.

> What I'd like to clarify, however: rfc2581 explicitly states that in
> some cases it might be OK to generate ACKs less frequently than
> every second full-sized segment. Given Matt's measurements, TCP on
> top of IP over InfiniBand on Linux seems to hit one of these cases.
> Do you agree to that?

I disagree with Linux changing it's behavior.  It would be great to
turn off congestion control completely over local gigabit networks,
but that isn't determinable in any way, so we don't do that.

The IPoIB situation is no different, you can set all the bits you want
in incoming packets, the barrier to doing this remains the same.

It hurts performance if any packet drop occurs because it will require
an extra round trip for recovery to begin to be triggered at the
sender.

The network is a black box, routes to and from a destination are
arbitrary, and so is packet rewriting and reflection, so being able to
say "this all occurs on IPoIB" is simply infeasible.

I don't know how else to say this, we simply cannot special case IPoIB
or any other topology type.

  reply	other threads:[~2006-03-20 10:37 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-06 22:34 TSO and IPoIB performance degradation Michael S. Tsirkin
2006-03-06 22:40 ` David S. Miller
2006-03-06 22:50 ` Stephen Hemminger
2006-03-07  3:13   ` Shirley Ma
2006-03-07 21:44     ` Matt Leininger
2006-03-07 21:49       ` Stephen Hemminger
2006-03-07 21:53         ` Michael S. Tsirkin
2006-03-08  0:11         ` Matt Leininger
2006-03-08  0:18           ` David S. Miller
2006-03-08  1:17             ` Roland Dreier
2006-03-08  1:23               ` David S. Miller
2006-03-08  1:34                 ` Roland Dreier
2006-03-08 12:53                 ` Michael S. Tsirkin
2006-03-08 20:53                   ` David S. Miller
2006-03-09 23:48                   ` David S. Miller
2006-03-10  0:10                     ` Michael S. Tsirkin
2006-03-10  0:38                       ` Michael S. Tsirkin
2006-03-10  7:18                       ` David S. Miller
2006-03-10  0:21                     ` Rick Jones
2006-03-10  7:23                       ` David S. Miller
2006-03-10 17:44                         ` Rick Jones
2006-03-20  9:06                         ` Michael S. Tsirkin
2006-03-20  9:55                           ` David S. Miller
2006-03-20 10:22                             ` Michael S. Tsirkin
2006-03-20 10:37                               ` David S. Miller [this message]
2006-03-20 11:27                                 ` Michael S. Tsirkin
2006-03-20 11:47                                   ` Arjan van de Ven
2006-03-20 11:49                                     ` Lennert Buytenhek
2006-03-20 11:53                                       ` Arjan van de Ven
2006-03-20 13:35                                         ` Michael S. Tsirkin
2006-03-20 12:04                                       ` Michael S. Tsirkin
2006-03-20 15:09                                         ` Benjamin LaHaise
2006-03-20 18:58                                           ` Rick Jones
2006-03-20 23:00                                           ` David S. Miller
2006-04-27  4:13                                 ` Troy Benjegerdes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060320.023704.70907203.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@mellanox.co.il \
    --cc=netdev@vger.kernel.org \
    --cc=openib-general@openib.org \
    --cc=rdreier@cisco.com \
    --cc=rick.jones2@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).