netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: netdev@vger.kernel.org
Cc: Roland Dreier <roland@purestorage.com>,
	"David S. Miller" <davem@davemloft.net>,
	Cong Wang <xiyou.wangcong@gmail.com>
Subject: [RFC Patch net-next] ipv6: look up neighbours on demand in ip6_finish_output2()
Date: Mon, 14 Jan 2013 21:35:15 +0800	[thread overview]
Message-ID: <1358170515-1383-1-git-send-email-xiyou.wangcong@gmail.com> (raw)

From: Cong Wang <xiyou.wangcong@gmail.com>

Roland reported that we may use a "stray" neigh entry if someone else
holds a refcount of this entry. Therefore leads to neigh_blackhole().

And David said:

"Another reason we must make ipv6 like ipv4, which looks up neighbours
on demand at packet output time rather than caching them in the route
entries."

Thus, we just follow what IPv4 does, call __ipv6_neigh_lookup_noref()
to avoid using the "stray" neighbour.

Roland, could you test if this could fix your problem? I don't
have the environment.

Reported-by: Roland Dreier <roland@purestorage.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

---
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 23b3a7c..a5fe1b3 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -148,14 +148,16 @@ static inline u32 ndisc_hashfn(const void *pkey, const struct net_device *dev, _
 		(p32[3] * hash_rnd[3]));
 }
 
-static inline struct neighbour *__ipv6_neigh_lookup(struct neigh_table *tbl, struct net_device *dev, const void *pkey)
+static inline
+struct neighbour *__ipv6_neigh_lookup_noref(struct neigh_table *tbl,
+					    struct net_device *dev,
+					    const void *pkey)
 {
 	struct neigh_hash_table *nht;
 	const u32 *p32 = pkey;
 	struct neighbour *n;
 	u32 hash_val;
 
-	rcu_read_lock_bh();
 	nht = rcu_dereference_bh(tbl->nht);
 	hash_val = ndisc_hashfn(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift);
 	for (n = rcu_dereference_bh(nht->hash_buckets[hash_val]);
@@ -165,11 +167,23 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct neigh_table *tbl, str
 		if (n->dev == dev &&
 		    ((n32[0] ^ p32[0]) | (n32[1] ^ p32[1]) |
 		     (n32[2] ^ p32[2]) | (n32[3] ^ p32[3])) == 0) {
-			if (!atomic_inc_not_zero(&n->refcnt))
-				n = NULL;
 			break;
 		}
 	}
+
+	return n;
+}
+
+static inline
+struct neighbour *__ipv6_neigh_lookup(struct neigh_table *tbl,
+				      struct net_device *dev, const void *pkey)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv6_neigh_lookup_noref(tbl, dev, pkey);
+	if (n && (!atomic_inc_not_zero(&n->refcnt)))
+		n = NULL;
 	rcu_read_unlock_bh();
 
 	return n;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 9250c69..20f2e86 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -89,11 +89,12 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	struct net_device *dev = dst->dev;
 	struct neighbour *neigh;
 	struct rt6_info *rt;
+	const void *daddr = &ipv6_hdr(skb)->daddr;
 
 	skb->protocol = htons(ETH_P_IPV6);
 	skb->dev = dev;
 
-	if (ipv6_addr_is_multicast(&ipv6_hdr(skb)->daddr)) {
+	if (ipv6_addr_is_multicast(daddr)) {
 		struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
 
 		if (!(dev->flags & IFF_LOOPBACK) && sk_mc_loop(skb->sk) &&
@@ -124,9 +125,17 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	}
 
 	rt = (struct rt6_info *) dst;
-	neigh = rt->n;
-	if (neigh)
+	rcu_read_lock_bh();
+	if (rt->rt6i_flags & RTF_GATEWAY)
+		daddr = &rt->rt6i_gateway;
+	neigh = __ipv6_neigh_lookup_noref(&nd_tbl, dev, daddr);
+	if (unlikely(!neigh))
+		neigh = __neigh_create(&nd_tbl, daddr, dev, false);
+	if (!IS_ERR(neigh)) {
+		rcu_read_unlock_bh();
 		return dst_neigh_output(dst, neigh, skb);
+	}
+	rcu_read_unlock_bh();
 
 	IP6_INC_STATS_BH(dev_net(dst->dev),
 			 ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);

             reply	other threads:[~2013-01-14 13:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-14 13:35 Cong Wang [this message]
2013-01-14 16:29 ` [RFC Patch net-next] ipv6: look up neighbours on demand in ip6_finish_output2() Eric Dumazet
2013-01-14 18:30 ` David Miller
2013-01-15  2:28   ` Cong Wang
2013-01-15 12:22     ` YOSHIFUJI Hideaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1358170515-1383-1-git-send-email-xiyou.wangcong@gmail.com \
    --to=xiyou.wangcong@gmail.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=roland@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).