* ip6_tunnel. mtu/pmtu problems.
@ 2010-10-15 12:40 Anders Franzen
2010-10-15 13:09 ` Eric Dumazet
0 siblings, 1 reply; 2+ messages in thread
From: Anders Franzen @ 2010-10-15 12:40 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
Hi, I've noticed that the ip6_tunnel driver completly ignore to update
the route when a ''bearer'' has a lower mtu then whats expected by the
tunnel device.
Comparing to the ipip tunnel I found that ip6_tunnel is missing the
following line in the ip6_tnl_dev_setup:
dev->priv_flags &= ~IFF_XMIT_DST_RELEASE
Since it's not there, all code that are dependent on an skb_dst(skb)
returning something, can be removed.
this is update_pmtu and icmp_send.
Any how adding the flag to tell the device layer not to release skb->dst
at dev_hard_start_xmit, made things better.
But encap limit is on by default and it consumes 8 bytes, so true mtu
for an ip6_tunnel over a 1500 bytes ethernet shall be 1452 not 1460.
Is it is now I loose the first packet everytime a new route is created.
I updated the driver to take encap_limit into account, if enabled, now
it works even better.
But I have one problem left, and I can reproduce it on the ipv4 ipip
tunnel aswell.
With a bit asymmetric routing setup, I can get the driver to generate an
icmp FRAG_NEEDED. If i configure the routing in such a way that the
forwarding towards the src of the oversized packet, is via the tunnel.
This happends:
Dead loop on virtual device vip4, fix it urgently!
It is because the dev layer has taken a lock on the tx queue for the
device selected for the primary packet (the tunnel), and the tunnel
wants to send an icmp, also on the same device, the lock is held for
transmission of the primary packet, and the icmp gets discarded, with a
nasty kernel msg.
I think this case is a valid case, and the Dead loop is just an
implementation limitation.
Maybe we should try to schedule the icmp do delay it until the primary
packet sending has returned and released the lock.
This is the routing setup I use to trigger the Dead loop, both on ipip
tunnels and ip6_tunnels.
We have 4 nodes A,B,C,D
C is a router, routing AB to/from D
B has a tunnel toward C
B has a default route using the tunnel to C
A has a route to D pointing to B
I raise the MTU of the tunnel endpoint at B by a couple of bytes, to
simulate the encap_limit 8 bytes effect when left out. Or actually
having a bearer device indicate a lower mtu than was expected.
let A ping -M do -s 1470 D
A sends to B, B forwards to tunnel, which will calculate it's mtu to
1480 (ipv4) based on its own overhead and the route mtu of the bearer
route. Since we set the MTU of the tunnel higher than that, the tunnel
will send an icmp back to A, but the route here says that you reach A
via the tunnel it self, and Dead loop......
If the lock in the device layer shall be there, then I think the icmp
should be run from a kthread or something?
Any comments?
Best regards
Anders
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: ip6_tunnel. mtu/pmtu problems.
2010-10-15 12:40 ip6_tunnel. mtu/pmtu problems Anders Franzen
@ 2010-10-15 13:09 ` Eric Dumazet
0 siblings, 0 replies; 2+ messages in thread
From: Eric Dumazet @ 2010-10-15 13:09 UTC (permalink / raw)
To: Anders Franzen; +Cc: netdev
Le vendredi 15 octobre 2010 à 14:40 +0200, Anders Franzen a écrit :
> Hi, I've noticed that the ip6_tunnel driver completly ignore to update
> the route when a ''bearer'' has a lower mtu then whats expected by the
> tunnel device.
>
> Comparing to the ipip tunnel I found that ip6_tunnel is missing the
> following line in the ip6_tnl_dev_setup:
> dev->priv_flags &= ~IFF_XMIT_DST_RELEASE
>
Good catch :)
> Since it's not there, all code that are dependent on an skb_dst(skb)
> returning something, can be removed.
> this is update_pmtu and icmp_send.
>
> Any how adding the flag to tell the device layer not to release skb->dst
> at dev_hard_start_xmit, made things better.
>
> But encap limit is on by default and it consumes 8 bytes, so true mtu
> for an ip6_tunnel over a 1500 bytes ethernet shall be 1452 not 1460.
>
> Is it is now I loose the first packet everytime a new route is created.
>
> I updated the driver to take encap_limit into account, if enabled, now
> it works even better.
>
> But I have one problem left, and I can reproduce it on the ipv4 ipip
> tunnel aswell.
>
> With a bit asymmetric routing setup, I can get the driver to generate an
> icmp FRAG_NEEDED. If i configure the routing in such a way that the
> forwarding towards the src of the oversized packet, is via the tunnel.
>
> This happends:
>
> Dead loop on virtual device vip4, fix it urgently!
>
> It is because the dev layer has taken a lock on the tx queue for the
> device selected for the primary packet (the tunnel), and the tunnel
> wants to send an icmp, also on the same device, the lock is held for
> transmission of the primary packet, and the icmp gets discarded, with a
> nasty kernel msg.
>
> I think this case is a valid case, and the Dead loop is just an
> implementation limitation.
>
> Maybe we should try to schedule the icmp do delay it until the primary
> packet sending has returned and released the lock.
>
>
>
> This is the routing setup I use to trigger the Dead loop, both on ipip
> tunnels and ip6_tunnels.
>
>
>
> We have 4 nodes A,B,C,D
>
>
> C is a router, routing AB to/from D
>
> B has a tunnel toward C
> B has a default route using the tunnel to C
>
> A has a route to D pointing to B
>
> I raise the MTU of the tunnel endpoint at B by a couple of bytes, to
> simulate the encap_limit 8 bytes effect when left out. Or actually
> having a bearer device indicate a lower mtu than was expected.
>
> let A ping -M do -s 1470 D
>
> A sends to B, B forwards to tunnel, which will calculate it's mtu to
> 1480 (ipv4) based on its own overhead and the route mtu of the bearer
> route. Since we set the MTU of the tunnel higher than that, the tunnel
> will send an icmp back to A, but the route here says that you reach A
> via the tunnel it self, and Dead loop......
>
>
> If the lock in the device layer shall be there, then I think the icmp
> should be run from a kthread or something?
If the only thing blocking you is the dead loop, please try following
patch from net-next-2.6, currently a candidate for net-2.6
commit 745e20f1b626b1be4b100af5d4bf7b3439392f8f
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed Sep 29 13:23:09 2010 -0700
net: add a recursion limit in xmit path
As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.
Add a percpu variable, and limit to three the number of stacked xmits.
Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
http://git2.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=745e20f1b626b1be4b100af5d4bf7b3439392f8f
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-10-15 13:09 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-15 12:40 ip6_tunnel. mtu/pmtu problems Anders Franzen
2010-10-15 13:09 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox