* [PATCH net-next 1/2] ipv6: force RTF_NONEXTHOP for SIT device
@ 2012-09-12 12:01 Eric Dumazet
2012-09-12 20:53 ` Maciej Żenczykowski
2012-09-13 2:59 ` Eric Dumazet
0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2012-09-12 12:01 UTC (permalink / raw)
To: David Miller
Cc: netdev, Lorenzo Colitti, Maciej Żenczykowski, Tom Herbert
From: Eric Dumazet <edumazet@google.com>
We have special handling of SIT devices in addrconf_prefix_route()
to avoid using a neighbour for each destination.
If routing entry is :
ip -6 route add 2001:db8::/64 dev sit1
Then the kernel will create a new route for every new address
under 2001:db8::/64 that we send a packet to (potentially, 2^64
routes).
Under load, we immediately get the infamous "Neighbour table overflow"
message and machine eventually crash.
This does not happen if we specify a next-hop explicitly, like so:
ip -6 route add 2001:db8::/64 via fe80:: dev sit1
We can avoid this hassle doing the SIT test in ip6_route_add() instead
of addrconf_prefix_route().
This permits ip6_pol_route() to clone route instead of calling
rt6_alloc_cow() and allocate a neighbour
Reported-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
---
net/ipv6/addrconf.c | 10 ----------
net/ipv6/route.c | 9 +++++++++
2 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1237d5d..c6837d2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1679,16 +1679,6 @@ addrconf_prefix_route(struct in6_addr *pfx, int plen, struct net_device *dev,
};
cfg.fc_dst = *pfx;
-
- /* Prevent useless cloning on PtP SIT.
- This thing is done here expecting that the whole
- class of non-broadcast devices need not cloning.
- */
-#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE)
- if (dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT))
- cfg.fc_flags |= RTF_NONEXTHOP;
-#endif
-
ip6_route_add(&cfg);
}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 399613b..d4ba3fc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1540,6 +1540,15 @@ int ip6_route_add(struct fib6_config *cfg)
} else
rt->rt6i_prefsrc.plen = 0;
+ /* Prevent useless cloning on PtP SIT.
+ * This thing is done here expecting that the whole
+ * class of non-broadcast devices need not cloning.
+ */
+#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE)
+ if (dev && dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT))
+ cfg->fc_flags |= RTF_NONEXTHOP;
+#endif
+
if (cfg->fc_flags & (RTF_GATEWAY | RTF_NONEXTHOP)) {
err = rt6_bind_neighbour(rt, dev);
if (err)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next 1/2] ipv6: force RTF_NONEXTHOP for SIT device
2012-09-12 12:01 [PATCH net-next 1/2] ipv6: force RTF_NONEXTHOP for SIT device Eric Dumazet
@ 2012-09-12 20:53 ` Maciej Żenczykowski
2012-09-13 2:59 ` Eric Dumazet
1 sibling, 0 replies; 6+ messages in thread
From: Maciej Żenczykowski @ 2012-09-12 20:53 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Lorenzo Colitti, Tom Herbert
On Wed, Sep 12, 2012 at 5:01 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> We have special handling of SIT devices in addrconf_prefix_route()
> to avoid using a neighbour for each destination.
>
> If routing entry is :
>
> ip -6 route add 2001:db8::/64 dev sit1
>
> Then the kernel will create a new route for every new address
> under 2001:db8::/64 that we send a packet to (potentially, 2^64
> routes).
>
> Under load, we immediately get the infamous "Neighbour table overflow"
> message and machine eventually crash.
>
> This does not happen if we specify a next-hop explicitly, like so:
>
> ip -6 route add 2001:db8::/64 via fe80:: dev sit1
>
> We can avoid this hassle doing the SIT test in ip6_route_add() instead
> of addrconf_prefix_route().
>
> This permits ip6_pol_route() to clone route instead of calling
> rt6_alloc_cow() and allocate a neighbour
>
> Reported-by: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Tom Herbert <therbert@google.com>
> ---
> net/ipv6/addrconf.c | 10 ----------
> net/ipv6/route.c | 9 +++++++++
> 2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 1237d5d..c6837d2 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1679,16 +1679,6 @@ addrconf_prefix_route(struct in6_addr *pfx, int plen, struct net_device *dev,
> };
>
> cfg.fc_dst = *pfx;
> -
> - /* Prevent useless cloning on PtP SIT.
> - This thing is done here expecting that the whole
> - class of non-broadcast devices need not cloning.
> - */
> -#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE)
> - if (dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT))
> - cfg.fc_flags |= RTF_NONEXTHOP;
> -#endif
> -
> ip6_route_add(&cfg);
> }
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 399613b..d4ba3fc 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1540,6 +1540,15 @@ int ip6_route_add(struct fib6_config *cfg)
> } else
> rt->rt6i_prefsrc.plen = 0;
>
> + /* Prevent useless cloning on PtP SIT.
> + * This thing is done here expecting that the whole
> + * class of non-broadcast devices need not cloning.
> + */
> +#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE)
> + if (dev && dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT))
> + cfg->fc_flags |= RTF_NONEXTHOP;
> +#endif
> +
> if (cfg->fc_flags & (RTF_GATEWAY | RTF_NONEXTHOP)) {
> err = rt6_bind_neighbour(rt, dev);
> if (err)
>
>
Acked-by: Maciej Żenczykowski <maze@google.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next 1/2] ipv6: force RTF_NONEXTHOP for SIT device
2012-09-12 12:01 [PATCH net-next 1/2] ipv6: force RTF_NONEXTHOP for SIT device Eric Dumazet
2012-09-12 20:53 ` Maciej Żenczykowski
@ 2012-09-13 2:59 ` Eric Dumazet
2012-09-13 3:15 ` [PATCH v2 net-next] ipv6: prevent useless neigh alloc on PTP or lo routes Eric Dumazet
1 sibling, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2012-09-13 2:59 UTC (permalink / raw)
To: David Miller
Cc: netdev, Lorenzo Colitti, Maciej Żenczykowski, Tom Herbert
On Wed, 2012-09-12 at 14:01 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> We have special handling of SIT devices in addrconf_prefix_route()
> to avoid using a neighbour for each destination.
>
> If routing entry is :
>
> ip -6 route add 2001:db8::/64 dev sit1
>
> Then the kernel will create a new route for every new address
> under 2001:db8::/64 that we send a packet to (potentially, 2^64
> routes).
>
> Under load, we immediately get the infamous "Neighbour table overflow"
> message and machine eventually crash.
>
> This does not happen if we specify a next-hop explicitly, like so:
>
> ip -6 route add 2001:db8::/64 via fe80:: dev sit1
>
> We can avoid this hassle doing the SIT test in ip6_route_add() instead
> of addrconf_prefix_route().
>
> This permits ip6_pol_route() to clone route instead of calling
> rt6_alloc_cow() and allocate a neighbour
>
> Reported-by: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Tom Herbert <therbert@google.com>
> ---
> net/ipv6/addrconf.c | 10 ----------
> net/ipv6/route.c | 9 +++++++++
> 2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 1237d5d..c6837d2 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1679,16 +1679,6 @@ addrconf_prefix_route(struct in6_addr *pfx, int plen, struct net_device *dev,
> };
>
> cfg.fc_dst = *pfx;
> -
> - /* Prevent useless cloning on PtP SIT.
> - This thing is done here expecting that the whole
> - class of non-broadcast devices need not cloning.
> - */
> -#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE)
> - if (dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT))
> - cfg.fc_flags |= RTF_NONEXTHOP;
> -#endif
> -
> ip6_route_add(&cfg);
> }
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 399613b..d4ba3fc 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1540,6 +1540,15 @@ int ip6_route_add(struct fib6_config *cfg)
> } else
> rt->rt6i_prefsrc.plen = 0;
>
> + /* Prevent useless cloning on PtP SIT.
> + * This thing is done here expecting that the whole
> + * class of non-broadcast devices need not cloning.
> + */
> +#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE)
> + if (dev && dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT))
> + cfg->fc_flags |= RTF_NONEXTHOP;
> +#endif
> +
> if (cfg->fc_flags & (RTF_GATEWAY | RTF_NONEXTHOP)) {
> err = rt6_bind_neighbour(rt, dev);
> if (err)
>
Please hold this patch, I'll send a v2, based on excellent feedback from
Lorenzo.
Idea is to just do :
if (dev->flags & (IFF_POINTOPOINT | IFF_LOOPBACK))
cfg->fc_flags |= RFT_NONEXTHOP;
(no mention of SIT anymore, and a change in the title patch)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 net-next] ipv6: prevent useless neigh alloc on PTP or lo routes
2012-09-13 2:59 ` Eric Dumazet
@ 2012-09-13 3:15 ` Eric Dumazet
2012-09-13 21:13 ` David Miller
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2012-09-13 3:15 UTC (permalink / raw)
To: David Miller
Cc: netdev, Lorenzo Colitti, Maciej Żenczykowski, Tom Herbert,
Willem de Bruijn
From: Eric Dumazet <edumazet@google.com>
We have special handling of SIT devices in addrconf_prefix_route()
to avoid allocating a neighbour for each destination.
If routing entry is :
ip -6 route add 2001:db8::/64 dev sit1
Then the kernel will create a new route and neighbour for every new
address under 2001:db8::/64 that we send a packet to
(potentially, 2^64 routes and neighbours).
Under load, we immediately get the infamous "Neighbour table overflow"
message and machine eventually crash.
This does not happen if we specify a next-hop explicitly, like so:
ip -6 route add 2001:db8::/64 via fe80:: dev sit1
Same problem happens if we use routes to loopback.
Idea of this patch is to move existing SIT related code from
addrconf_prefix_route() to a more generic one in ip6_route_add().
This permits ip6_pol_route() to clone route instead of calling
rt6_alloc_cow() and allocate a neighbour.
Many thanks to Lorenzo for his help and suggestions.
Reported-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Willem de Bruijn <willemb@google.com>
---
net/ipv6/addrconf.c | 10 ----------
net/ipv6/route.c | 4 ++++
2 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1237d5d..c6837d2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1679,16 +1679,6 @@ addrconf_prefix_route(struct in6_addr *pfx, int plen, struct net_device *dev,
};
cfg.fc_dst = *pfx;
-
- /* Prevent useless cloning on PtP SIT.
- This thing is done here expecting that the whole
- class of non-broadcast devices need not cloning.
- */
-#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE)
- if (dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT))
- cfg.fc_flags |= RTF_NONEXTHOP;
-#endif
-
ip6_route_add(&cfg);
}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 399613b..7df8dfc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1540,6 +1540,10 @@ int ip6_route_add(struct fib6_config *cfg)
} else
rt->rt6i_prefsrc.plen = 0;
+ /* Prevent useless cloning on link types that don't have next hops. */
+ if (dev->flags & (IFF_POINTOPOINT | IFF_LOOPBACK))
+ cfg->fc_flags |= RTF_NONEXTHOP;
+
if (cfg->fc_flags & (RTF_GATEWAY | RTF_NONEXTHOP)) {
err = rt6_bind_neighbour(rt, dev);
if (err)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 net-next] ipv6: prevent useless neigh alloc on PTP or lo routes
2012-09-13 3:15 ` [PATCH v2 net-next] ipv6: prevent useless neigh alloc on PTP or lo routes Eric Dumazet
@ 2012-09-13 21:13 ` David Miller
2012-09-13 21:51 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2012-09-13 21:13 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, lorenzo, maze, therbert, willemb
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 13 Sep 2012 05:15:58 +0200
> From: Eric Dumazet <edumazet@google.com>
>
> We have special handling of SIT devices in addrconf_prefix_route()
> to avoid allocating a neighbour for each destination.
>
> If routing entry is :
>
> ip -6 route add 2001:db8::/64 dev sit1
>
> Then the kernel will create a new route and neighbour for every new
> address under 2001:db8::/64 that we send a packet to
> (potentially, 2^64 routes and neighbours).
>
> Under load, we immediately get the infamous "Neighbour table overflow"
> message and machine eventually crash.
>
> This does not happen if we specify a next-hop explicitly, like so:
>
> ip -6 route add 2001:db8::/64 via fe80:: dev sit1
>
> Same problem happens if we use routes to loopback.
>
> Idea of this patch is to move existing SIT related code from
> addrconf_prefix_route() to a more generic one in ip6_route_add().
>
> This permits ip6_pol_route() to clone route instead of calling
> rt6_alloc_cow() and allocate a neighbour.
>
> Many thanks to Lorenzo for his help and suggestions.
>
> Reported-by: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
This patch lacks the desired effect without your clone-caching-removal
patch, which I will not apply.
Therefore it doesn't make any sense to apply this either, as it won't
fix the stated problem.
Doing a proper conversion of ipv6 to ref-count-less neigh's will solve
this problem and allow all of the clone/cow caching code to be elided
for the majority of cases and is the correct approach to these problems.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 net-next] ipv6: prevent useless neigh alloc on PTP or lo routes
2012-09-13 21:13 ` David Miller
@ 2012-09-13 21:51 ` Eric Dumazet
0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2012-09-13 21:51 UTC (permalink / raw)
To: David Miller; +Cc: netdev, lorenzo, maze, therbert, willemb
On Thu, 2012-09-13 at 17:13 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 13 Sep 2012 05:15:58 +0200
>
> > From: Eric Dumazet <edumazet@google.com>
> >
> > We have special handling of SIT devices in addrconf_prefix_route()
> > to avoid allocating a neighbour for each destination.
> >
> > If routing entry is :
> >
> > ip -6 route add 2001:db8::/64 dev sit1
> >
> > Then the kernel will create a new route and neighbour for every new
> > address under 2001:db8::/64 that we send a packet to
> > (potentially, 2^64 routes and neighbours).
> >
> > Under load, we immediately get the infamous "Neighbour table overflow"
> > message and machine eventually crash.
> >
> > This does not happen if we specify a next-hop explicitly, like so:
> >
> > ip -6 route add 2001:db8::/64 via fe80:: dev sit1
> >
> > Same problem happens if we use routes to loopback.
> >
> > Idea of this patch is to move existing SIT related code from
> > addrconf_prefix_route() to a more generic one in ip6_route_add().
> >
> > This permits ip6_pol_route() to clone route instead of calling
> > rt6_alloc_cow() and allocate a neighbour.
> >
> > Many thanks to Lorenzo for his help and suggestions.
> >
> > Reported-by: Lorenzo Colitti <lorenzo@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> This patch lacks the desired effect without your clone-caching-removal
> patch, which I will not apply.
>
> Therefore it doesn't make any sense to apply this either, as it won't
> fix the stated problem.
>
> Doing a proper conversion of ipv6 to ref-count-less neigh's will solve
> this problem and allow all of the clone/cow caching code to be elided
> for the majority of cases and is the correct approach to these problems.
This seems quite different patches to me. Addressing two problems.
This patch is about not allocating a neighbour for a given route, but
reusing one existing neighbour. What could be possibly wrong with this,
since all neighbours are exactly the sames ?
I understand we can make it better with big surgery later, but right now
it seems quite reasonable.
This is already done (in part) for SIT devices, which are a subclass of
PtP device.
For the other patch, it seems problem was introduced in 2.6.38 when
CLONE_OFFLINK_ROUTE was removed in commit d80bc0fd26.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-09-13 21:51 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-12 12:01 [PATCH net-next 1/2] ipv6: force RTF_NONEXTHOP for SIT device Eric Dumazet
2012-09-12 20:53 ` Maciej Żenczykowski
2012-09-13 2:59 ` Eric Dumazet
2012-09-13 3:15 ` [PATCH v2 net-next] ipv6: prevent useless neigh alloc on PTP or lo routes Eric Dumazet
2012-09-13 21:13 ` David Miller
2012-09-13 21:51 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).