From: David Miller <davem@davemloft.net>
To: eguzovsky@gmail.com
Cc: berni@birkenwald.de, dlstevens@us.ibm.com, pekkas@netcore.fi,
netdev@vger.kernel.org
Subject: Re: [IPv6] "sendmsg: invalid argument" to multicast group after some time
Date: Mon, 29 Dec 2008 23:52:30 -0800 (PST) [thread overview]
Message-ID: <20081229.235230.53178944.davem@davemloft.net> (raw)
Eduard, thanks for your analysis and RFC patch.
I agree this is an ugly situation.
Looking over this area the real problem is that the neighbour cache
can't do anything to apply back pressure on the routing cache when it
fills up with essentially unused multicast entries like this.
When we hit the upper limits (such as gc_thresh3) for the neighbour
cache, it tries to do things like neigh_forced_gc().
But this won't accomplish anything since all of these ipv6 multicast
routes have a reference on the neigh entries filling up the table, so
the forced GC won't be able to liberate them
So you're absolutely right that the route cache pollution is the core
problem.
Looking at the IPV4 routing cache we have code which goes:
int err = arp_bind_neighbour(&rt->u.dst);
if (err) {
...
/* Neighbour tables are full and nothing
can be released. Try to shrink route cache,
it is most likely it holds some neighbour records.
*/
and then proceeds to try and forcefully flush some routing cache
entries.
So the real fix is that IPV6 should do something similar.
Something like the following (untested) patch:
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index ce532f2..1459ed3 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -155,9 +155,9 @@ static inline struct neighbour * ndisc_get_neigh(struct net_device *dev, const s
{
if (dev)
- return __neigh_lookup(&nd_tbl, addr, dev, 1);
+ return __neigh_lookup_errno(&nd_tbl, addr, dev);
- return NULL;
+ return ERR_PTR(-ENODEV);
}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 18c486c..0db4129 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -627,6 +627,9 @@ static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort, struct in6_addr *dad
rt = ip6_rt_copy(ort);
if (rt) {
+ struct neighbour *neigh;
+ int attempts = !in_softirq();
+
if (!(rt->rt6i_flags&RTF_GATEWAY)) {
if (rt->rt6i_dst.plen != 128 &&
ipv6_addr_equal(&rt->rt6i_dst.addr, daddr))
@@ -646,7 +649,35 @@ static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort, struct in6_addr *dad
}
#endif
- rt->rt6i_nexthop = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway);
+ retry:
+ neigh = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway);
+ if (IS_ERR(neigh)) {
+ struct net *net = dev_net(rt->rt6i_dev);
+ int saved_rt_min_interval =
+ net->ipv6.sysctl.ip6_rt_gc_min_interval;
+ int saved_rt_elasticity =
+ net->ipv6.sysctl.ip6_rt_gc_elasticity;
+
+ if (attempts-- > 0) {
+ net->ipv6.sysctl.ip6_rt_gc_elasticity = 1;
+ net->ipv6.sysctl.ip6_rt_gc_min_interval = 0;
+
+ ip6_dst_gc(net->ipv6.ip6_dst_ops);
+
+ net->ipv6.sysctl.ip6_rt_gc_elasticity =
+ saved_rt_elasticity;
+ net->ipv6.sysctl.ip6_rt_gc_min_interval =
+ saved_rt_min_interval;
+ goto retry;
+ }
+
+ if (net_ratelimit())
+ printk(KERN_WARNING
+ "Neighbour table overflow.\n");
+ dst_free(&rt->u.dst);
+ return NULL;
+ }
+ rt->rt6i_nexthop = neigh;
}
@@ -945,8 +976,11 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
dev_hold(dev);
if (neigh)
neigh_hold(neigh);
- else
+ else {
neigh = ndisc_get_neigh(dev, addr);
+ if (IS_ERR(neigh))
+ neigh = NULL;
+ }
rt->rt6i_dev = dev;
rt->rt6i_idev = idev;
@@ -1887,6 +1921,7 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
{
struct net *net = dev_net(idev->dev);
struct rt6_info *rt = ip6_dst_alloc(net->ipv6.ip6_dst_ops);
+ struct neighbour *neigh;
if (rt == NULL)
return ERR_PTR(-ENOMEM);
@@ -1909,11 +1944,18 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
rt->rt6i_flags |= RTF_ANYCAST;
else
rt->rt6i_flags |= RTF_LOCAL;
- rt->rt6i_nexthop = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway);
- if (rt->rt6i_nexthop == NULL) {
+ neigh = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway);
+ if (IS_ERR(neigh)) {
dst_free(&rt->u.dst);
- return ERR_PTR(-ENOMEM);
+
+ /* We are casting this because that is the return
+ * value type. But a errno encoded pointer is the
+ * same regardless of the underlying pointer type,
+ * and that's what we are returning. So this is OK.
+ */
+ return (struct rt6_info *) neigh;
}
+ rt->rt6i_nexthop = neigh;
ipv6_addr_copy(&rt->rt6i_dst.addr, addr);
rt->rt6i_dst.plen = 128;
next reply other threads:[~2008-12-30 7:52 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-30 7:52 David Miller [this message]
2008-12-31 19:53 ` [IPv6] "sendmsg: invalid argument" to multicast group after some time Eduard Guzovsky
2009-01-04 23:56 ` David Miller
-- strict thread matches above, loose matches on Subject: below --
2008-12-28 4:47 Eduard Guzovsky
2008-08-31 18:20 Bernhard Schmidt
2008-09-01 5:49 ` David Stevens
2008-09-01 9:09 ` Bernhard Schmidt
2008-09-01 13:03 ` David Stevens
2008-09-01 17:01 ` Bernhard Schmidt
2008-09-01 17:05 ` Bernhard Schmidt
2008-09-01 17:57 ` Pekka Savola
2008-09-01 18:03 ` Bernhard Schmidt
2008-09-02 9:06 ` Pekka Savola
2008-09-02 13:57 ` Brian Haley
2008-09-02 15:00 ` Bernhard Schmidt
2008-09-02 15:48 ` Brian Haley
2008-09-09 0:34 ` David Stevens
2008-09-09 0:38 ` Bernhard Schmidt
2008-09-09 2:26 ` David Stevens
2008-09-09 6:52 ` Rémi Denis-Courmont
2008-09-09 7:17 ` David Stevens
2008-09-09 10:06 ` Bernhard Schmidt
2008-09-09 15:05 ` David Stevens
2008-09-09 17:16 ` Pekka Savola
2008-09-09 20:13 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081229.235230.53178944.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=berni@birkenwald.de \
--cc=dlstevens@us.ibm.com \
--cc=eguzovsky@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pekkas@netcore.fi \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).