From: Timo Teras <timo.teras@iki.fi>
To: netdev@vger.kernel.org
Subject: Re: linux-3.6+, gre+ipsec+forwarding = IP fragmentation broken
Date: Fri, 15 Mar 2013 13:38:20 +0200 [thread overview]
Message-ID: <20130315133820.006a42f6@vostro> (raw)
In-Reply-To: <20130315112516.4b1651ca@vostro>
On Fri, 15 Mar 2013 11:25:16 +0200
Timo Teras <timo.teras@iki.fi> wrote:
> On Wed, 13 Mar 2013 17:14:53 +0200
> Timo Teras <timo.teras@iki.fi> wrote:
>
> > In the typical DMVPN setup with IPv4-ESP-GRE-IPv4 stack, it seems
> > that IPv4 fragmentation got broke around 3.6 for forwarded packets.
> >
> > It would seem that fragmentation works for locally generated
> > packets. Also PMTU (DF set) seems to work for both forwarded and
> > locally generated packets. But forwarded packets to gre device that
> > gets IPsec encrypted do not get fragmented properly.
> >
> > 3.4.x kernels work, 3.6 and 3.8 series tested and fail similarly.
>
> Actually 3.4.x vanilla does not work. It works only with 38d523e
> "ipv4: Remove output route check in ipv4_mtu" applied which I've been
> cherry-picking to my builds.
>
> > I was going through the changelog and it seems that MTU is now
> > handled in nexthop exceptions and one needs to produce the full
> > flow info to update it. I'm wonding if this does not hold true in
> > my code path as ip_gre rewraps the forwarded packet and creates new
> > IP header - when it next goes to the xfrm code (which sends the
> > ICMP error) the inner iphdr is no longer accessible. Would this
> > cause the breakage that I'm seeing? Or the forward flow's mtu still
> > updated somehow?
>
> I have now a theory on what goes wrong.
>
> My gre tunnel is configured with 'ttl 64' so the tunnel IP header
> always gets DF bit set to do proper path-mtu. The kind of locally
> generated ICMP messages I get, imply that re-fragmentation happens
> only on the tunnel's IPv4 header level - but it'll be too late then:
> the large packet is queued, IPsec'ed and it is the IPsec'ed packet
> that gets is tried to be fragmented (but it has DF set so it fails and
> packet is dropped).
>
> I believe ip_gre should explicitly fragment the inner IPv4 and IPv6
> packets if the tunnel's ttl is not inherited (resulting in DF bit set
> in the tunnel's IPv4 header).
>
> So basically ip_gre worked wrong all along - things just happened to
> work due to GRO/GSO not implemented in ip_gre, and the way (the now
> deleted) routing cache exposed pmtu.
>
> Does this make sense?
Not really. Seems the fragmentation should happen already on the
earlier dst level. Though, this implies that GSO cannot be used in
ip_gre if ttl != inherit.
I added some ip_gre debugging and the following seems to happen:
- the mtu is calculated correctly on xmit path:
dst_mtu(&rt->dst) = 1458 (the tunnel's XFRMed IPv4 path)
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
is called with mtu=1430, which seems correct
- dst_mtu(skb_dst(skb)) seems to still return after above call the
value 1472 which is wrong. so update_pmtu is not working.
- skb->dev->ifindex implies skb->dev points to gre device when
update_pmtu is being called (and not the ethX from which the packet
was received), so ip_rt_update_pmtu() which eventually calls
build_skb_flow_key() is likely using wrong ifindex for the flow
- Timo
next prev parent reply other threads:[~2013-03-15 11:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-13 15:14 linux-3.6+, gre+ipsec+forwarding = IP fragmentation broken Timo Teras
2013-03-15 9:25 ` Timo Teras
2013-03-15 11:38 ` Timo Teras [this message]
2013-03-15 13:03 ` Timo Teras
[not found] ` <20130320101318.4196d93a@vostro>
2013-03-20 17:46 ` [regression] [analyzed] fragmentation broken for tunnel devices David Miller
2013-05-01 6:46 ` Timo Teras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130315133820.006a42f6@vostro \
--to=timo.teras@iki.fi \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).