* iptables CLAMP MSS to PMTU not working?
@ 2012-07-12 9:00 Timo Teras
2012-07-12 10:24 ` Timo Teras
0 siblings, 1 reply; 8+ messages in thread
From: Timo Teras @ 2012-07-12 9:00 UTC (permalink / raw)
To: netdev
Hi,
We recently noticed that CLAMPMSS to path MTU does not seem to be
working properly. Most recently tested version is linux-3.3.6 which
does not work. linux-2.6.35 works for sure, but I suspect it to have
broken somewhere around 3.0'ish with the inetpeer changes.
In my case, the destination is on gre tunnel (that gets routed to
Internet over IPsec transport mode).
'ip route' command verifies that in both boxes the path-MTU is detected
properly. That, is on both cases the static route MTU is higher. And
after large packets sent, ICMP frag-needed is received and the cache
route is updated properly.
On the new kernel, I get info like:
# ip route get 10.x.x.x
10.x.x.x via 172.16.y.y dev gre1 src 172.16.z.z
cache expires 68sec ipid 0x3153 mtu 1422
And the older kernel:
# ip route get 10.x.x.x
10.x.x.x via 172.16.y.y dev gre1 src 172.16.z.z
cache expires 595sec ipid 0xd241 mtu 1422 advmss 1432 hoplimit 64
For some reason, iptables CLAMPMSS seems to set incorrect MSS for this
route (or maybe it's using the static route instead?).
Any ideas?
Thanks,
Timo
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: iptables CLAMP MSS to PMTU not working? 2012-07-12 9:00 iptables CLAMP MSS to PMTU not working? Timo Teras @ 2012-07-12 10:24 ` Timo Teras 2012-07-16 5:49 ` Timo Teras 0 siblings, 1 reply; 8+ messages in thread From: Timo Teras @ 2012-07-12 10:24 UTC (permalink / raw) To: netdev To reply my self for some additional notes. On Thu, 12 Jul 2012 12:00:21 +0300 Timo Teras <timo.teras@iki.fi> wrote: > We recently noticed that CLAMPMSS to path MTU does not seem to be > working properly. Most recently tested version is linux-3.3.6 which > does not work. linux-2.6.35 works for sure, but I suspect it to have > broken somewhere around 3.0'ish with the inetpeer changes. > > In my case, the destination is on gre tunnel (that gets routed to > Internet over IPsec transport mode). > > 'ip route' command verifies that in both boxes the path-MTU is > detected properly. That, is on both cases the static route MTU is > higher. And after large packets sent, ICMP frag-needed is received > and the cache route is updated properly. > > On the new kernel, I get info like: > # ip route get 10.x.x.x > 10.x.x.x via 172.16.y.y dev gre1 src 172.16.z.z > cache expires 68sec ipid 0x3153 mtu 1422 CLAMP MSS sets MSS to 1432. Which implies MTU 1472. This matches the gre1 interface MTU: 14: gre1: <UP,LOWER_UP> mtu 1472 qdisc noqueue state UNKNOWN So apparently CLAMPMSS is honoring the static route for gre1, instead of the cached pmtu route. > And the older kernel: > # ip route get 10.x.x.x > 10.x.x.x via 172.16.y.y dev gre1 src 172.16.z.z > cache expires 595sec ipid 0xd241 mtu 1422 advmss 1432 hoplimit 64 > > For some reason, iptables CLAMPMSS seems to set incorrect MSS for this > route (or maybe it's using the static route instead?). And in this case MSS is set to 1382. That is, it's properly calculated from the path MTU (1422-40=1382). I would expect the advmss of the cached route to get updated on the TCP connects on the older kernels (the above paste is after pinging with large packets and no TCP connection done for the cached entry). - Timo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: iptables CLAMP MSS to PMTU not working? 2012-07-12 10:24 ` Timo Teras @ 2012-07-16 5:49 ` Timo Teras 2012-07-16 6:20 ` Timo Teras 0 siblings, 1 reply; 8+ messages in thread From: Timo Teras @ 2012-07-16 5:49 UTC (permalink / raw) To: Steffen Klassert; +Cc: netdev On Thu, 12 Jul 2012 13:24:19 +0300 Timo Teras <timo.teras@iki.fi> wrote: > On Thu, 12 Jul 2012 12:00:21 +0300 Timo Teras <timo.teras@iki.fi> > wrote: > > > We recently noticed that CLAMPMSS to path MTU does not seem to be > > working properly. Most recently tested version is linux-3.3.6 which > > does not work. linux-2.6.35 works for sure, but I suspect it to have > > broken somewhere around 3.0'ish with the inetpeer changes. > > > > In my case, the destination is on gre tunnel (that gets routed to > > Internet over IPsec transport mode). > > > > 'ip route' command verifies that in both boxes the path-MTU is > > detected properly. That, is on both cases the static route MTU is > > higher. And after large packets sent, ICMP frag-needed is received > > and the cache route is updated properly. > > > > On the new kernel, I get info like: > > # ip route get 10.x.x.x > > 10.x.x.x via 172.16.y.y dev gre1 src 172.16.z.z > > cache expires 68sec ipid 0x3153 mtu 1422 > > CLAMP MSS sets MSS to 1432. Which implies MTU 1472. This matches the > gre1 interface MTU: > > 14: gre1: <UP,LOWER_UP> mtu 1472 qdisc noqueue state UNKNOWN > > So apparently CLAMPMSS is honoring the static route for gre1, instead > of the cached pmtu route. > > > And the older kernel: > > # ip route get 10.x.x.x > > 10.x.x.x via 172.16.y.y dev gre1 src 172.16.z.z > > cache expires 595sec ipid 0xd241 mtu 1422 advmss 1432 hoplimit > > 64 > > > > For some reason, iptables CLAMPMSS seems to set incorrect MSS for > > this route (or maybe it's using the static route instead?). > > And in this case MSS is set to 1382. That is, it's properly calculated > from the path MTU (1422-40=1382). I would expect the advmss of the > cached route to get updated on the TCP connects on the older kernels > (the above paste is after pinging with large packets and no TCP > connection done for the cached entry). Looking at the changelog, this would likely be side effect of: commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1 Author: Steffen Klassert <steffen.klassert@secunet.com> Date: Wed Nov 23 02:14:50 2011 +0000 ipv4: Don't use the cached pmtu informations for input routes At least from performance side, it would be better if CLAMPMSS to PMTU would clamp to the learned, cached mtu. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: iptables CLAMP MSS to PMTU not working? 2012-07-16 5:49 ` Timo Teras @ 2012-07-16 6:20 ` Timo Teras 2012-07-16 7:23 ` Steffen Klassert 0 siblings, 1 reply; 8+ messages in thread From: Timo Teras @ 2012-07-16 6:20 UTC (permalink / raw) To: David S. Miller, Steffen Klassert; +Cc: netdev On Mon, 16 Jul 2012 08:49:46 +0300 Timo Teras <timo.teras@iki.fi> wrote: > On Thu, 12 Jul 2012 13:24:19 +0300 Timo Teras <timo.teras@iki.fi> > wrote: > > > On Thu, 12 Jul 2012 12:00:21 +0300 Timo Teras <timo.teras@iki.fi> > > wrote: > > > > > We recently noticed that CLAMPMSS to path MTU does not seem to be > > > working properly. Most recently tested version is linux-3.3.6 > > > which does not work. linux-2.6.35 works for sure, but I suspect > > > it to have broken somewhere around 3.0'ish with the inetpeer > > > changes. > > > > > > In my case, the destination is on gre tunnel (that gets routed to > > > Internet over IPsec transport mode). > > > > > > 'ip route' command verifies that in both boxes the path-MTU is > > > detected properly. That, is on both cases the static route MTU is > > > higher. And after large packets sent, ICMP frag-needed is received > > > and the cache route is updated properly. > > > > > > On the new kernel, I get info like: > > > # ip route get 10.x.x.x > > > 10.x.x.x via 172.16.y.y dev gre1 src 172.16.z.z > > > cache expires 68sec ipid 0x3153 mtu 1422 > > > > CLAMP MSS sets MSS to 1432. Which implies MTU 1472. This matches the > > gre1 interface MTU: > > > > 14: gre1: <UP,LOWER_UP> mtu 1472 qdisc noqueue state UNKNOWN > > > > So apparently CLAMPMSS is honoring the static route for gre1, > > instead of the cached pmtu route. > > > > > And the older kernel: > > > # ip route get 10.x.x.x > > > 10.x.x.x via 172.16.y.y dev gre1 src 172.16.z.z > > > cache expires 595sec ipid 0xd241 mtu 1422 advmss 1432 > > > hoplimit 64 > > > > > > For some reason, iptables CLAMPMSS seems to set incorrect MSS for > > > this route (or maybe it's using the static route instead?). > > > > And in this case MSS is set to 1382. That is, it's properly > > calculated from the path MTU (1422-40=1382). I would expect the > > advmss of the cached route to get updated on the TCP connects on > > the older kernels (the above paste is after pinging with large > > packets and no TCP connection done for the cached entry). > > Looking at the changelog, this would likely be side effect of: > > commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1 > Author: Steffen Klassert <steffen.klassert@secunet.com> > Date: Wed Nov 23 02:14:50 2011 +0000 > > ipv4: Don't use the cached pmtu informations for input routes > > At least from performance side, it would be better if CLAMPMSS to PMTU > would clamp to the learned, cached mtu. Actually, this is worse. Since XFRM is ignored - it breaks fragmentation for IPsec targets. Could this be reverted? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: iptables CLAMP MSS to PMTU not working? 2012-07-16 6:20 ` Timo Teras @ 2012-07-16 7:23 ` Steffen Klassert 2012-07-16 7:55 ` Timo Teras 0 siblings, 1 reply; 8+ messages in thread From: Steffen Klassert @ 2012-07-16 7:23 UTC (permalink / raw) To: Timo Teras; +Cc: David S. Miller, netdev On Mon, Jul 16, 2012 at 09:20:58AM +0300, Timo Teras wrote: > On Mon, 16 Jul 2012 08:49:46 +0300 Timo Teras <timo.teras@iki.fi> wrote: > > > Looking at the changelog, this would likely be side effect of: > > > > commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1 > > Author: Steffen Klassert <steffen.klassert@secunet.com> > > Date: Wed Nov 23 02:14:50 2011 +0000 > > > > ipv4: Don't use the cached pmtu informations for input routes > > > > At least from performance side, it would be better if CLAMPMSS to PMTU > > would clamp to the learned, cached mtu. > > Actually, this is worse. Since XFRM is ignored - it breaks > fragmentation for IPsec targets. > > Could this be reverted? I did this patch to avoid to propagate learned PMTU informations. It restores the behaviour we had before we moved the PMTU informations to the inetpeer. Unfortunately CLAMPMSS really wants to have the PMTU informations of an input route, which is not possible any more after this patch. Anyway, this patch seems to be obsolete in the net-next tree, as the cached pmtu informations are back in the route. So we should remove the check for an output route from ipv4_mtu() in the net-next tree. This should bring CLAMPMSS back to work, at least for upcoming kernel versions. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: iptables CLAMP MSS to PMTU not working? 2012-07-16 7:23 ` Steffen Klassert @ 2012-07-16 7:55 ` Timo Teras 2012-07-16 10:08 ` Steffen Klassert 0 siblings, 1 reply; 8+ messages in thread From: Timo Teras @ 2012-07-16 7:55 UTC (permalink / raw) To: Steffen Klassert; +Cc: David S. Miller, netdev On Mon, 16 Jul 2012 09:23:05 +0200 Steffen Klassert <steffen.klassert@secunet.com> wrote: > On Mon, Jul 16, 2012 at 09:20:58AM +0300, Timo Teras wrote: > > On Mon, 16 Jul 2012 08:49:46 +0300 Timo Teras <timo.teras@iki.fi> > > wrote: > > > > > Looking at the changelog, this would likely be side effect of: > > > > > > commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1 > > > Author: Steffen Klassert <steffen.klassert@secunet.com> > > > Date: Wed Nov 23 02:14:50 2011 +0000 > > > > > > ipv4: Don't use the cached pmtu informations for input routes > > > > > > At least from performance side, it would be better if CLAMPMSS to > > > PMTU would clamp to the learned, cached mtu. > > > > Actually, this is worse. Since XFRM is ignored - it breaks > > fragmentation for IPsec targets. > > > > Could this be reverted? > > I did this patch to avoid to propagate learned PMTU informations. > It restores the behaviour we had before we moved the PMTU informations > to the inetpeer. Unfortunately CLAMPMSS really wants to have the PMTU > informations of an input route, which is not possible any more after > this patch. > > Anyway, this patch seems to be obsolete in the net-next tree, as > the cached pmtu informations are back in the route. So we should > remove the check for an output route from ipv4_mtu() in the net-next > tree. This should bring CLAMPMSS back to work, at least for upcoming > kernel versions. Right, saw those commits. But before net-next hits release, I'd really need a fix for 3.3/3.4/3.5. Non-working fragmentation with IPsec, and this CLAMPMSS thingy are an upgrade stopper for me. Would it be safe to just revert this commit, with the side-effect of exposing cached pmtu too agressively? Or would it be better to try to backport the relevant changes of moving pmtu back to route table? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: iptables CLAMP MSS to PMTU not working? 2012-07-16 7:55 ` Timo Teras @ 2012-07-16 10:08 ` Steffen Klassert 2012-07-16 10:53 ` Timo Teras 0 siblings, 1 reply; 8+ messages in thread From: Steffen Klassert @ 2012-07-16 10:08 UTC (permalink / raw) To: Timo Teras; +Cc: David S. Miller, netdev On Mon, Jul 16, 2012 at 10:55:46AM +0300, Timo Teras wrote: > On Mon, 16 Jul 2012 09:23:05 +0200 Steffen Klassert > <steffen.klassert@secunet.com> wrote: > > > > I did this patch to avoid to propagate learned PMTU informations. > > It restores the behaviour we had before we moved the PMTU informations > > to the inetpeer. Unfortunately CLAMPMSS really wants to have the PMTU > > informations of an input route, which is not possible any more after > > this patch. > > > > Anyway, this patch seems to be obsolete in the net-next tree, as > > the cached pmtu informations are back in the route. So we should > > remove the check for an output route from ipv4_mtu() in the net-next > > tree. This should bring CLAMPMSS back to work, at least for upcoming > > kernel versions. > > Right, saw those commits. But before net-next hits release, I'd really > need a fix for 3.3/3.4/3.5. Non-working fragmentation with IPsec, and > this CLAMPMSS thingy are an upgrade stopper for me. > > Would it be safe to just revert this commit, with the side-effect of > exposing cached pmtu too agressively? The router that can't send the packet to the next hop network has to send the ICMP Destination Unreachable message. We never propagated learned PMTU informations and I would not like to change this, in particular not in a stable kernel. Maybe we could fix this for already released kernels within the netfilter module. Perhaps we could add a function static unsigned int tcpmss_mtu(const struct dst_entry *dst) { unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); return mtu ? : dst_mtu(dst->path); } and use this instead of dst_mtu(). > > Or would it be better to try to backport the relevant changes of moving > pmtu back to route table? A backport is probaply too invasive for a stable kernel. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: iptables CLAMP MSS to PMTU not working? 2012-07-16 10:08 ` Steffen Klassert @ 2012-07-16 10:53 ` Timo Teras 0 siblings, 0 replies; 8+ messages in thread From: Timo Teras @ 2012-07-16 10:53 UTC (permalink / raw) To: Steffen Klassert; +Cc: David S. Miller, netdev On Mon, 16 Jul 2012 12:08:44 +0200 Steffen Klassert <steffen.klassert@secunet.com> wrote: > On Mon, Jul 16, 2012 at 10:55:46AM +0300, Timo Teras wrote: > > On Mon, 16 Jul 2012 09:23:05 +0200 Steffen Klassert > > <steffen.klassert@secunet.com> wrote: > > > > > > I did this patch to avoid to propagate learned PMTU informations. > > > It restores the behaviour we had before we moved the PMTU > > > informations to the inetpeer. Unfortunately CLAMPMSS really wants > > > to have the PMTU informations of an input route, which is not > > > possible any more after this patch. > > > > > > Anyway, this patch seems to be obsolete in the net-next tree, as > > > the cached pmtu informations are back in the route. So we should > > > remove the check for an output route from ipv4_mtu() in the > > > net-next tree. This should bring CLAMPMSS back to work, at least > > > for upcoming kernel versions. > > > > Right, saw those commits. But before net-next hits release, I'd > > really need a fix for 3.3/3.4/3.5. Non-working fragmentation with > > IPsec, and this CLAMPMSS thingy are an upgrade stopper for me. > > > > Would it be safe to just revert this commit, with the side-effect of > > exposing cached pmtu too agressively? > > The router that can't send the packet to the next hop network has to > send the ICMP Destination Unreachable message. We never propagated > learned PMTU informations and I would not like to change this, in > particular not in a stable kernel. Ah, now I understand what you mean with "propagation". Leaking out the PMTU of locally originating traffic to the forward path. Makes sense. I'll probably just revert the change locally, as for my use this should not be a problem. > Maybe we could fix this for already released kernels within the > netfilter module. Perhaps we could add a function > > static unsigned int tcpmss_mtu(const struct dst_entry *dst) > { > unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); > > return mtu ? : dst_mtu(dst->path); > } > > and use this instead of dst_mtu(). Yes, of course this fixes TCPMSS; but not IPsec fragmentation which is even more critical. On forward path, if I send large IP-packets (e.g. ping -s 1500) with DF-bit not set to a destination that is XFRMed, I get now blackholed. This is because the large packet is fragmented according to the forward route MTU which now does not account the XFRM headers and the fragments are too large to be sent out. This causes the packets to get dropped to floor; and since DF was not set, there's no ICMP error sent out. With fragmentation the originator is not and does not want to be aware of the PMTU. Basically the MTU needs to be accurate for fragmentation to work. Just using the "device MTU" or "route MTU" is not enough. This is because XFRMed target MTU is currently learned dynamically. The per-packet overhead can depend on destination IP (e.g. per-IP SA can have different hash or encapsulation which affects the header size and thus the overall MTU). I guess it would be better if the XFRMed MTUs were calculated properly. This would avoid one lost packet -- the one that gets sent out using the device/route MTU, then triggers route pmtu to get updated, and dropped. Then further packets get fragmented according to the cached pmtu properly. Would be nice if this "learning" packet was not needed. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-07-16 10:53 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-12 9:00 iptables CLAMP MSS to PMTU not working? Timo Teras 2012-07-12 10:24 ` Timo Teras 2012-07-16 5:49 ` Timo Teras 2012-07-16 6:20 ` Timo Teras 2012-07-16 7:23 ` Steffen Klassert 2012-07-16 7:55 ` Timo Teras 2012-07-16 10:08 ` Steffen Klassert 2012-07-16 10:53 ` Timo Teras
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).