From: Timo Teras <timo.teras@iki.fi>
To: netdev@vger.kernel.org
Subject: Re: linux-3.6+, gre+ipsec+forwarding = IP fragmentation broken
Date: Fri, 15 Mar 2013 13:38:20 +0200 [thread overview]
Message-ID: <20130315133820.006a42f6@vostro> (raw)
In-Reply-To: <20130315112516.4b1651ca@vostro>
On Fri, 15 Mar 2013 11:25:16 +0200
Timo Teras <timo.teras@iki.fi> wrote:
> On Wed, 13 Mar 2013 17:14:53 +0200
> Timo Teras <timo.teras@iki.fi> wrote:
>
> > In the typical DMVPN setup with IPv4-ESP-GRE-IPv4 stack, it seems
> > that IPv4 fragmentation got broke around 3.6 for forwarded packets.
> >
> > It would seem that fragmentation works for locally generated
> > packets. Also PMTU (DF set) seems to work for both forwarded and
> > locally generated packets. But forwarded packets to gre device that
> > gets IPsec encrypted do not get fragmented properly.
> >
> > 3.4.x kernels work, 3.6 and 3.8 series tested and fail similarly.
>
> Actually 3.4.x vanilla does not work. It works only with 38d523e
> "ipv4: Remove output route check in ipv4_mtu" applied which I've been
> cherry-picking to my builds.
>
> > I was going through the changelog and it seems that MTU is now
> > handled in nexthop exceptions and one needs to produce the full
> > flow info to update it. I'm wonding if this does not hold true in
> > my code path as ip_gre rewraps the forwarded packet and creates new
> > IP header - when it next goes to the xfrm code (which sends the
> > ICMP error) the inner iphdr is no longer accessible. Would this
> > cause the breakage that I'm seeing? Or the forward flow's mtu still
> > updated somehow?
>
> I have now a theory on what goes wrong.
>
> My gre tunnel is configured with 'ttl 64' so the tunnel IP header
> always gets DF bit set to do proper path-mtu. The kind of locally
> generated ICMP messages I get, imply that re-fragmentation happens
> only on the tunnel's IPv4 header level - but it'll be too late then:
> the large packet is queued, IPsec'ed and it is the IPsec'ed packet
> that gets is tried to be fragmented (but it has DF set so it fails and
> packet is dropped).
>
> I believe ip_gre should explicitly fragment the inner IPv4 and IPv6
> packets if the tunnel's ttl is not inherited (resulting in DF bit set
> in the tunnel's IPv4 header).
>
> So basically ip_gre worked wrong all along - things just happened to
> work due to GRO/GSO not implemented in ip_gre, and the way (the now
> deleted) routing cache exposed pmtu.
>
> Does this make sense?
Not really. Seems the fragmentation should happen already on the
earlier dst level. Though, this implies that GSO cannot be used in
ip_gre if ttl != inherit.
I added some ip_gre debugging and the following seems to happen:
- the mtu is calculated correctly on xmit path:
dst_mtu(&rt->dst) = 1458 (the tunnel's XFRMed IPv4 path)
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
is called with mtu=1430, which seems correct
- dst_mtu(skb_dst(skb)) seems to still return after above call the
value 1472 which is wrong. so update_pmtu is not working.
- skb->dev->ifindex implies skb->dev points to gre device when
update_pmtu is being called (and not the ethX from which the packet
was received), so ip_rt_update_pmtu() which eventually calls
build_skb_flow_key() is likely using wrong ifindex for the flow
- Timo
next prev parent reply other threads:[~2013-03-15 11:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-13 15:14 linux-3.6+, gre+ipsec+forwarding = IP fragmentation broken Timo Teras
2013-03-15 9:25 ` Timo Teras
2013-03-15 11:38 ` Timo Teras [this message]
2013-03-15 13:03 ` Timo Teras
[not found] ` <20130320101318.4196d93a@vostro>
2013-03-20 17:46 ` [regression] [analyzed] fragmentation broken for tunnel devices David Miller
2013-05-01 6:46 ` Timo Teras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130315133820.006a42f6@vostro \
--to=timo.teras@iki.fi \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.