netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tore Anderson <tore@fud.no>
To: "Maciej Żenczykowski" <maze@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Tom Herbert <therbert@google.com>
Subject: Re: [PATCH net-next] ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing
Date: Wed, 25 Apr 2012 11:20:01 +0200	[thread overview]
Message-ID: <f88609a3e80bbe53233e62dec2699a3e@greed.fud.no> (raw)
In-Reply-To: <CANP3RGdhs8s_RytR=f8ismSZdGs91bpVq=ZAjb0EOm-gCsDPAw@mail.gmail.com>

* Maciej Żenczykowski

>> But we chose to _not_ decrease mtu and adhere to the specs.
>
> I get that we _choose_ to behave such, and I agree this adheres to
> specs.

"Chose" (past), not "choose" (present). ;-)

This patch does not make this choice. This patch merely fixes a bug in
the implementation of the choice that was made a long time ago.

> But I'm not convinced that (even though this is allowed per RFC) this
> is the right choice.

That is a different issue entirely, but I don't disagree with you. A
"min_pmtu" sysctl or something like that would be useful.

> Also note that IPv6 prefers to see fragmentation happen at the end
> hosts, and not at the routers.
> Although of course it doesn't treat a tunnel end point as a router.

Actually, in IPv6, fragmentation *must* be performed by end hosts,
routers (including tunnel end points) *cannot* fragment.

However, the use case for the allfrag feature is not handling tunnels,
but IPv4<->IPv6 translation. The issue is that a IPv6 host may very 
well
receive an ICMPv6 Packet Too Big indicating a PMTU of <1280 that was
originally transmitted by an IPv4 router (as an ICMPv4 Need To 
Fragment)
and underwent translation to IPv6.

In this case, the IPv6 node does not need to reduce the PMTU to <1280
(Linux does not), but it is not invalid to have a <1280 MTU link in the
IPv4 internet either, so something else must be done for the
communication to work. The solution is then to include the IPv6 
Fragment
extension header, so that the translator have a suitable Identification
value to copy into the translated IPv4 header, and may therefore clear
the Don't Fragment flag, so that the IPv4 router will fragment the
packet as it is forwarded onto the low-MTU link.

In case you're interested, I have a slide deck below that explains the
use case for IPv4<->IPv6 translation. Slide 25 is about the particular
corner case where the allfrag feature is necessary. URL:

http://fud.no/talks/20120417-RIPE64-The_Case_for_IPv6_Only_Data_Centres.pdf

Tore

  reply	other threads:[~2012-04-25  9:41 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-24 17:37 [PATCH net-next] ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing Eric Dumazet
2012-04-24 19:49 ` Maciej Żenczykowski
2012-04-24 20:10   ` Eric Dumazet
2012-04-24 21:50     ` Maciej Żenczykowski
2012-04-24 21:51       ` Maciej Żenczykowski
2012-04-25  5:32       ` Eric Dumazet
2012-04-25  7:34         ` Maciej Żenczykowski
2012-04-25  9:20           ` Tore Anderson [this message]
2012-04-25  9:38             ` Eric Dumazet
2012-04-25  9:51               ` Tore Anderson
2012-04-25  9:52               ` Maciej Żenczykowski
2012-04-25 10:02               ` Eric Dumazet
2012-04-25 18:39                 ` David Miller
2012-04-25  9:48             ` Maciej Żenczykowski
2012-04-25 10:04               ` Tore Anderson
2012-04-25 10:15                 ` Eric Dumazet
2012-04-25 10:30                 ` Maciej Żenczykowski
2012-04-25 10:44                   ` Eric Dumazet
2012-04-26 10:32                     ` Tore Anderson
2012-04-25 10:45                   ` Tore Anderson
2012-04-25 11:02                     ` Maciej Żenczykowski
2012-04-25 11:49                       ` Tore Anderson
2012-04-25 11:55                         ` Maciej Żenczykowski
2012-04-27  4:03 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f88609a3e80bbe53233e62dec2699a3e@greed.fud.no \
    --to=tore@fud.no \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=maze@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=therbert@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).