netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* iptables CLAMP MSS to PMTU not working?
@ 2012-07-12  9:00 Timo Teras
  2012-07-12 10:24 ` Timo Teras
  0 siblings, 1 reply; 8+ messages in thread
From: Timo Teras @ 2012-07-12  9:00 UTC (permalink / raw)
  To: netdev

Hi,

We recently noticed that CLAMPMSS to path MTU does not seem to be
working properly. Most recently tested version is linux-3.3.6 which
does not work. linux-2.6.35 works for sure, but I suspect it to have
broken somewhere around 3.0'ish with the inetpeer changes.

In my case, the destination is on gre tunnel (that gets routed to
Internet over IPsec transport mode).

'ip route' command verifies that in both boxes the path-MTU is detected
properly. That, is on both cases the static route MTU is higher. And
after large packets sent, ICMP frag-needed is received and the cache
route is updated properly.

On the new kernel, I get info like:
# ip route get 10.x.x.x
10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
    cache  expires 68sec ipid 0x3153 mtu 1422

And the older kernel:
# ip route get 10.x.x.x
10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
    cache  expires 595sec ipid 0xd241 mtu 1422 advmss 1432 hoplimit 64

For some reason, iptables CLAMPMSS seems to set incorrect MSS for this
route (or maybe it's using the static route instead?).

Any ideas?

Thanks,
 Timo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: iptables CLAMP MSS to PMTU not working?
  2012-07-12  9:00 iptables CLAMP MSS to PMTU not working? Timo Teras
@ 2012-07-12 10:24 ` Timo Teras
  2012-07-16  5:49   ` Timo Teras
  0 siblings, 1 reply; 8+ messages in thread
From: Timo Teras @ 2012-07-12 10:24 UTC (permalink / raw)
  To: netdev

To reply my self for some additional notes.

On Thu, 12 Jul 2012 12:00:21 +0300 Timo Teras <timo.teras@iki.fi> wrote:

> We recently noticed that CLAMPMSS to path MTU does not seem to be
> working properly. Most recently tested version is linux-3.3.6 which
> does not work. linux-2.6.35 works for sure, but I suspect it to have
> broken somewhere around 3.0'ish with the inetpeer changes.
> 
> In my case, the destination is on gre tunnel (that gets routed to
> Internet over IPsec transport mode).
> 
> 'ip route' command verifies that in both boxes the path-MTU is
> detected properly. That, is on both cases the static route MTU is
> higher. And after large packets sent, ICMP frag-needed is received
> and the cache route is updated properly.
> 
> On the new kernel, I get info like:
> # ip route get 10.x.x.x
> 10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
>     cache  expires 68sec ipid 0x3153 mtu 1422

CLAMP MSS sets MSS to 1432. Which implies MTU 1472. This matches the
gre1 interface MTU:

14: gre1: <UP,LOWER_UP> mtu 1472 qdisc noqueue state UNKNOWN 

So apparently CLAMPMSS is honoring the static route for gre1, instead
of the cached pmtu route.

> And the older kernel:
> # ip route get 10.x.x.x
> 10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
>     cache  expires 595sec ipid 0xd241 mtu 1422 advmss 1432 hoplimit 64
> 
> For some reason, iptables CLAMPMSS seems to set incorrect MSS for this
> route (or maybe it's using the static route instead?).

And in this case MSS is set to 1382. That is, it's properly calculated
from the path MTU (1422-40=1382). I would expect the advmss of the
cached route to get updated on the TCP connects on the older kernels
(the above paste is after pinging with large packets and no TCP
connection done for the cached entry).

- Timo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: iptables CLAMP MSS to PMTU not working?
  2012-07-12 10:24 ` Timo Teras
@ 2012-07-16  5:49   ` Timo Teras
  2012-07-16  6:20     ` Timo Teras
  0 siblings, 1 reply; 8+ messages in thread
From: Timo Teras @ 2012-07-16  5:49 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev

On Thu, 12 Jul 2012 13:24:19 +0300 Timo Teras <timo.teras@iki.fi> wrote:

> On Thu, 12 Jul 2012 12:00:21 +0300 Timo Teras <timo.teras@iki.fi>
> wrote:
> 
> > We recently noticed that CLAMPMSS to path MTU does not seem to be
> > working properly. Most recently tested version is linux-3.3.6 which
> > does not work. linux-2.6.35 works for sure, but I suspect it to have
> > broken somewhere around 3.0'ish with the inetpeer changes.
> > 
> > In my case, the destination is on gre tunnel (that gets routed to
> > Internet over IPsec transport mode).
> > 
> > 'ip route' command verifies that in both boxes the path-MTU is
> > detected properly. That, is on both cases the static route MTU is
> > higher. And after large packets sent, ICMP frag-needed is received
> > and the cache route is updated properly.
> > 
> > On the new kernel, I get info like:
> > # ip route get 10.x.x.x
> > 10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
> >     cache  expires 68sec ipid 0x3153 mtu 1422
> 
> CLAMP MSS sets MSS to 1432. Which implies MTU 1472. This matches the
> gre1 interface MTU:
> 
> 14: gre1: <UP,LOWER_UP> mtu 1472 qdisc noqueue state UNKNOWN 
> 
> So apparently CLAMPMSS is honoring the static route for gre1, instead
> of the cached pmtu route.
> 
> > And the older kernel:
> > # ip route get 10.x.x.x
> > 10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
> >     cache  expires 595sec ipid 0xd241 mtu 1422 advmss 1432 hoplimit
> > 64
> > 
> > For some reason, iptables CLAMPMSS seems to set incorrect MSS for
> > this route (or maybe it's using the static route instead?).
> 
> And in this case MSS is set to 1382. That is, it's properly calculated
> from the path MTU (1422-40=1382). I would expect the advmss of the
> cached route to get updated on the TCP connects on the older kernels
> (the above paste is after pinging with large packets and no TCP
> connection done for the cached entry).

Looking at the changelog, this would likely be side effect of:

commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1
Author: Steffen Klassert <steffen.klassert@secunet.com>
Date:   Wed Nov 23 02:14:50 2011 +0000

    ipv4: Don't use the cached pmtu informations for input routes

At least from performance side, it would be better if CLAMPMSS to PMTU
would clamp to the learned, cached mtu.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: iptables CLAMP MSS to PMTU not working?
  2012-07-16  5:49   ` Timo Teras
@ 2012-07-16  6:20     ` Timo Teras
  2012-07-16  7:23       ` Steffen Klassert
  0 siblings, 1 reply; 8+ messages in thread
From: Timo Teras @ 2012-07-16  6:20 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert; +Cc: netdev

On Mon, 16 Jul 2012 08:49:46 +0300 Timo Teras <timo.teras@iki.fi> wrote:

> On Thu, 12 Jul 2012 13:24:19 +0300 Timo Teras <timo.teras@iki.fi>
> wrote:
> 
> > On Thu, 12 Jul 2012 12:00:21 +0300 Timo Teras <timo.teras@iki.fi>
> > wrote:
> > 
> > > We recently noticed that CLAMPMSS to path MTU does not seem to be
> > > working properly. Most recently tested version is linux-3.3.6
> > > which does not work. linux-2.6.35 works for sure, but I suspect
> > > it to have broken somewhere around 3.0'ish with the inetpeer
> > > changes.
> > > 
> > > In my case, the destination is on gre tunnel (that gets routed to
> > > Internet over IPsec transport mode).
> > > 
> > > 'ip route' command verifies that in both boxes the path-MTU is
> > > detected properly. That, is on both cases the static route MTU is
> > > higher. And after large packets sent, ICMP frag-needed is received
> > > and the cache route is updated properly.
> > > 
> > > On the new kernel, I get info like:
> > > # ip route get 10.x.x.x
> > > 10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
> > >     cache  expires 68sec ipid 0x3153 mtu 1422
> > 
> > CLAMP MSS sets MSS to 1432. Which implies MTU 1472. This matches the
> > gre1 interface MTU:
> > 
> > 14: gre1: <UP,LOWER_UP> mtu 1472 qdisc noqueue state UNKNOWN 
> > 
> > So apparently CLAMPMSS is honoring the static route for gre1,
> > instead of the cached pmtu route.
> > 
> > > And the older kernel:
> > > # ip route get 10.x.x.x
> > > 10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
> > >     cache  expires 595sec ipid 0xd241 mtu 1422 advmss 1432
> > > hoplimit 64
> > > 
> > > For some reason, iptables CLAMPMSS seems to set incorrect MSS for
> > > this route (or maybe it's using the static route instead?).
> > 
> > And in this case MSS is set to 1382. That is, it's properly
> > calculated from the path MTU (1422-40=1382). I would expect the
> > advmss of the cached route to get updated on the TCP connects on
> > the older kernels (the above paste is after pinging with large
> > packets and no TCP connection done for the cached entry).
> 
> Looking at the changelog, this would likely be side effect of:
> 
> commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1
> Author: Steffen Klassert <steffen.klassert@secunet.com>
> Date:   Wed Nov 23 02:14:50 2011 +0000
> 
>     ipv4: Don't use the cached pmtu informations for input routes
> 
> At least from performance side, it would be better if CLAMPMSS to PMTU
> would clamp to the learned, cached mtu.

Actually, this is worse. Since XFRM is ignored - it breaks
fragmentation for IPsec targets.

Could this be reverted?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: iptables CLAMP MSS to PMTU not working?
  2012-07-16  6:20     ` Timo Teras
@ 2012-07-16  7:23       ` Steffen Klassert
  2012-07-16  7:55         ` Timo Teras
  0 siblings, 1 reply; 8+ messages in thread
From: Steffen Klassert @ 2012-07-16  7:23 UTC (permalink / raw)
  To: Timo Teras; +Cc: David S. Miller, netdev

On Mon, Jul 16, 2012 at 09:20:58AM +0300, Timo Teras wrote:
> On Mon, 16 Jul 2012 08:49:46 +0300 Timo Teras <timo.teras@iki.fi> wrote:
> 
> > Looking at the changelog, this would likely be side effect of:
> > 
> > commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1
> > Author: Steffen Klassert <steffen.klassert@secunet.com>
> > Date:   Wed Nov 23 02:14:50 2011 +0000
> > 
> >     ipv4: Don't use the cached pmtu informations for input routes
> > 
> > At least from performance side, it would be better if CLAMPMSS to PMTU
> > would clamp to the learned, cached mtu.
> 
> Actually, this is worse. Since XFRM is ignored - it breaks
> fragmentation for IPsec targets.
> 
> Could this be reverted?

I did this patch to avoid to propagate learned PMTU informations.
It restores the behaviour we had before we moved the PMTU informations
to the inetpeer. Unfortunately CLAMPMSS really wants to have the PMTU
informations of an input route, which is not possible any more after
this patch.

Anyway, this patch seems to be obsolete in the net-next tree, as
the cached pmtu informations are back in the route. So we should remove
the check for an output route from ipv4_mtu() in the net-next tree.
This should bring CLAMPMSS back to work, at least for upcoming
kernel versions.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: iptables CLAMP MSS to PMTU not working?
  2012-07-16  7:23       ` Steffen Klassert
@ 2012-07-16  7:55         ` Timo Teras
  2012-07-16 10:08           ` Steffen Klassert
  0 siblings, 1 reply; 8+ messages in thread
From: Timo Teras @ 2012-07-16  7:55 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David S. Miller, netdev

On Mon, 16 Jul 2012 09:23:05 +0200 Steffen Klassert
<steffen.klassert@secunet.com> wrote:

> On Mon, Jul 16, 2012 at 09:20:58AM +0300, Timo Teras wrote:
> > On Mon, 16 Jul 2012 08:49:46 +0300 Timo Teras <timo.teras@iki.fi>
> > wrote:
> > 
> > > Looking at the changelog, this would likely be side effect of:
> > > 
> > > commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1
> > > Author: Steffen Klassert <steffen.klassert@secunet.com>
> > > Date:   Wed Nov 23 02:14:50 2011 +0000
> > > 
> > >     ipv4: Don't use the cached pmtu informations for input routes
> > > 
> > > At least from performance side, it would be better if CLAMPMSS to
> > > PMTU would clamp to the learned, cached mtu.
> > 
> > Actually, this is worse. Since XFRM is ignored - it breaks
> > fragmentation for IPsec targets.
> > 
> > Could this be reverted?
> 
> I did this patch to avoid to propagate learned PMTU informations.
> It restores the behaviour we had before we moved the PMTU informations
> to the inetpeer. Unfortunately CLAMPMSS really wants to have the PMTU
> informations of an input route, which is not possible any more after
> this patch.
>
> Anyway, this patch seems to be obsolete in the net-next tree, as
> the cached pmtu informations are back in the route. So we should
> remove the check for an output route from ipv4_mtu() in the net-next
> tree. This should bring CLAMPMSS back to work, at least for upcoming
> kernel versions.

Right, saw those commits. But before net-next hits release, I'd really
need a fix for 3.3/3.4/3.5. Non-working fragmentation with IPsec, and
this CLAMPMSS thingy are an upgrade stopper for me.

Would it be safe to just revert this commit, with the side-effect of
exposing cached pmtu too agressively?

Or would it be better to try to backport the relevant changes of moving
pmtu back to route table?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: iptables CLAMP MSS to PMTU not working?
  2012-07-16  7:55         ` Timo Teras
@ 2012-07-16 10:08           ` Steffen Klassert
  2012-07-16 10:53             ` Timo Teras
  0 siblings, 1 reply; 8+ messages in thread
From: Steffen Klassert @ 2012-07-16 10:08 UTC (permalink / raw)
  To: Timo Teras; +Cc: David S. Miller, netdev

On Mon, Jul 16, 2012 at 10:55:46AM +0300, Timo Teras wrote:
> On Mon, 16 Jul 2012 09:23:05 +0200 Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> > 
> > I did this patch to avoid to propagate learned PMTU informations.
> > It restores the behaviour we had before we moved the PMTU informations
> > to the inetpeer. Unfortunately CLAMPMSS really wants to have the PMTU
> > informations of an input route, which is not possible any more after
> > this patch.
> >
> > Anyway, this patch seems to be obsolete in the net-next tree, as
> > the cached pmtu informations are back in the route. So we should
> > remove the check for an output route from ipv4_mtu() in the net-next
> > tree. This should bring CLAMPMSS back to work, at least for upcoming
> > kernel versions.
> 
> Right, saw those commits. But before net-next hits release, I'd really
> need a fix for 3.3/3.4/3.5. Non-working fragmentation with IPsec, and
> this CLAMPMSS thingy are an upgrade stopper for me.
> 
> Would it be safe to just revert this commit, with the side-effect of
> exposing cached pmtu too agressively?

The router that can't send the packet to the next hop network has to
send the ICMP Destination Unreachable message. We never propagated
learned PMTU informations and I would not like to change this, in
particular not in a stable kernel.

Maybe we could fix this for already released kernels within the
netfilter module. Perhaps we could add a function

static unsigned int tcpmss_mtu(const struct dst_entry *dst)
{
        unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); 

        return mtu ? : dst_mtu(dst->path);
}


and use this instead of dst_mtu().

> 
> Or would it be better to try to backport the relevant changes of moving
> pmtu back to route table?

A backport is probaply too invasive for a stable kernel.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: iptables CLAMP MSS to PMTU not working?
  2012-07-16 10:08           ` Steffen Klassert
@ 2012-07-16 10:53             ` Timo Teras
  0 siblings, 0 replies; 8+ messages in thread
From: Timo Teras @ 2012-07-16 10:53 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David S. Miller, netdev

On Mon, 16 Jul 2012 12:08:44 +0200 Steffen Klassert
<steffen.klassert@secunet.com> wrote:

> On Mon, Jul 16, 2012 at 10:55:46AM +0300, Timo Teras wrote:
> > On Mon, 16 Jul 2012 09:23:05 +0200 Steffen Klassert
> > <steffen.klassert@secunet.com> wrote:
> > > 
> > > I did this patch to avoid to propagate learned PMTU informations.
> > > It restores the behaviour we had before we moved the PMTU
> > > informations to the inetpeer. Unfortunately CLAMPMSS really wants
> > > to have the PMTU informations of an input route, which is not
> > > possible any more after this patch.
> > >
> > > Anyway, this patch seems to be obsolete in the net-next tree, as
> > > the cached pmtu informations are back in the route. So we should
> > > remove the check for an output route from ipv4_mtu() in the
> > > net-next tree. This should bring CLAMPMSS back to work, at least
> > > for upcoming kernel versions.
> > 
> > Right, saw those commits. But before net-next hits release, I'd
> > really need a fix for 3.3/3.4/3.5. Non-working fragmentation with
> > IPsec, and this CLAMPMSS thingy are an upgrade stopper for me.
> > 
> > Would it be safe to just revert this commit, with the side-effect of
> > exposing cached pmtu too agressively?
> 
> The router that can't send the packet to the next hop network has to
> send the ICMP Destination Unreachable message. We never propagated
> learned PMTU informations and I would not like to change this, in
> particular not in a stable kernel.

Ah, now I understand what you mean with "propagation". Leaking out the
PMTU of locally originating traffic to the forward path.

Makes sense. I'll probably just revert the change locally, as for my
use this should not be a problem.

> Maybe we could fix this for already released kernels within the
> netfilter module. Perhaps we could add a function
> 
> static unsigned int tcpmss_mtu(const struct dst_entry *dst)
> {
>         unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); 
> 
>         return mtu ? : dst_mtu(dst->path);
> }
> 
> and use this instead of dst_mtu().

Yes, of course this fixes TCPMSS; but not IPsec fragmentation which is
even more critical.

On forward path, if I send large IP-packets (e.g. ping -s 1500) with
DF-bit not set to a destination that is XFRMed, I get now blackholed.

This is because the large packet is fragmented according to the forward
route MTU which now does not account the XFRM headers and the fragments
are too large to be sent out. This causes the packets to get dropped
to floor; and since DF was not set, there's no ICMP error sent out.
With fragmentation the originator is not and does not want to be aware
of the PMTU.

Basically the MTU needs to be accurate for fragmentation to work.

Just using the "device MTU" or "route MTU" is not enough. This is
because XFRMed target MTU is currently learned dynamically. The
per-packet overhead can depend on destination IP (e.g. per-IP SA can
have different hash or encapsulation which affects the header size and
thus the overall MTU).

I guess it would be better if the XFRMed MTUs were calculated properly.
This would avoid one lost packet -- the one that gets sent out using
the device/route MTU, then triggers route pmtu to get updated, and
dropped. Then further packets get fragmented according to the cached
pmtu properly. Would be nice if this "learning" packet was not needed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-07-16 10:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-12  9:00 iptables CLAMP MSS to PMTU not working? Timo Teras
2012-07-12 10:24 ` Timo Teras
2012-07-16  5:49   ` Timo Teras
2012-07-16  6:20     ` Timo Teras
2012-07-16  7:23       ` Steffen Klassert
2012-07-16  7:55         ` Timo Teras
2012-07-16 10:08           ` Steffen Klassert
2012-07-16 10:53             ` Timo Teras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).