netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC Patch net-next] ipv6: look up neighbours on demand in ip6_finish_output2()
@ 2013-01-14 13:35 Cong Wang
  2013-01-14 16:29 ` Eric Dumazet
  2013-01-14 18:30 ` David Miller
  0 siblings, 2 replies; 5+ messages in thread
From: Cong Wang @ 2013-01-14 13:35 UTC (permalink / raw)
  To: netdev; +Cc: Roland Dreier, David S. Miller, Cong Wang

From: Cong Wang <xiyou.wangcong@gmail.com>

Roland reported that we may use a "stray" neigh entry if someone else
holds a refcount of this entry. Therefore leads to neigh_blackhole().

And David said:

"Another reason we must make ipv6 like ipv4, which looks up neighbours
on demand at packet output time rather than caching them in the route
entries."

Thus, we just follow what IPv4 does, call __ipv6_neigh_lookup_noref()
to avoid using the "stray" neighbour.

Roland, could you test if this could fix your problem? I don't
have the environment.

Reported-by: Roland Dreier <roland@purestorage.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

---
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 23b3a7c..a5fe1b3 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -148,14 +148,16 @@ static inline u32 ndisc_hashfn(const void *pkey, const struct net_device *dev, _
 		(p32[3] * hash_rnd[3]));
 }
 
-static inline struct neighbour *__ipv6_neigh_lookup(struct neigh_table *tbl, struct net_device *dev, const void *pkey)
+static inline
+struct neighbour *__ipv6_neigh_lookup_noref(struct neigh_table *tbl,
+					    struct net_device *dev,
+					    const void *pkey)
 {
 	struct neigh_hash_table *nht;
 	const u32 *p32 = pkey;
 	struct neighbour *n;
 	u32 hash_val;
 
-	rcu_read_lock_bh();
 	nht = rcu_dereference_bh(tbl->nht);
 	hash_val = ndisc_hashfn(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift);
 	for (n = rcu_dereference_bh(nht->hash_buckets[hash_val]);
@@ -165,11 +167,23 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct neigh_table *tbl, str
 		if (n->dev == dev &&
 		    ((n32[0] ^ p32[0]) | (n32[1] ^ p32[1]) |
 		     (n32[2] ^ p32[2]) | (n32[3] ^ p32[3])) == 0) {
-			if (!atomic_inc_not_zero(&n->refcnt))
-				n = NULL;
 			break;
 		}
 	}
+
+	return n;
+}
+
+static inline
+struct neighbour *__ipv6_neigh_lookup(struct neigh_table *tbl,
+				      struct net_device *dev, const void *pkey)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv6_neigh_lookup_noref(tbl, dev, pkey);
+	if (n && (!atomic_inc_not_zero(&n->refcnt)))
+		n = NULL;
 	rcu_read_unlock_bh();
 
 	return n;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 9250c69..20f2e86 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -89,11 +89,12 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	struct net_device *dev = dst->dev;
 	struct neighbour *neigh;
 	struct rt6_info *rt;
+	const void *daddr = &ipv6_hdr(skb)->daddr;
 
 	skb->protocol = htons(ETH_P_IPV6);
 	skb->dev = dev;
 
-	if (ipv6_addr_is_multicast(&ipv6_hdr(skb)->daddr)) {
+	if (ipv6_addr_is_multicast(daddr)) {
 		struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
 
 		if (!(dev->flags & IFF_LOOPBACK) && sk_mc_loop(skb->sk) &&
@@ -124,9 +125,17 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	}
 
 	rt = (struct rt6_info *) dst;
-	neigh = rt->n;
-	if (neigh)
+	rcu_read_lock_bh();
+	if (rt->rt6i_flags & RTF_GATEWAY)
+		daddr = &rt->rt6i_gateway;
+	neigh = __ipv6_neigh_lookup_noref(&nd_tbl, dev, daddr);
+	if (unlikely(!neigh))
+		neigh = __neigh_create(&nd_tbl, daddr, dev, false);
+	if (!IS_ERR(neigh)) {
+		rcu_read_unlock_bh();
 		return dst_neigh_output(dst, neigh, skb);
+	}
+	rcu_read_unlock_bh();
 
 	IP6_INC_STATS_BH(dev_net(dst->dev),
 			 ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC Patch net-next] ipv6: look up neighbours on demand in ip6_finish_output2()
  2013-01-14 13:35 [RFC Patch net-next] ipv6: look up neighbours on demand in ip6_finish_output2() Cong Wang
@ 2013-01-14 16:29 ` Eric Dumazet
  2013-01-14 18:30 ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2013-01-14 16:29 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Roland Dreier, David S. Miller

On Mon, 2013-01-14 at 21:35 +0800, Cong Wang wrote:
> From: Cong Wang <xiyou.wangcong@gmail.com>
> 
> Roland reported that we may use a "stray" neigh entry if someone else
> holds a refcount of this entry. Therefore leads to neigh_blackhole().
> 
> And David said:
> 
> "Another reason we must make ipv6 like ipv4, which looks up neighbours
> on demand at packet output time rather than caching them in the route
> entries."

Not sure why you quoted this, as your patch doesn't do that.

(Its rt6_bind_neighbour() part)

> 
> Thus, we just follow what IPv4 does, call __ipv6_neigh_lookup_noref()
> to avoid using the "stray" neighbour.
> 
> Roland, could you test if this could fix your problem? I don't
> have the environment.
> 
> Reported-by: Roland Dreier <roland@purestorage.com>
> Cc: Roland Dreier <roland@purestorage.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>


diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index 23b3a7c..a5fe1b3 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -148,14 +148,16 @@ static inline u32 ndisc_hashfn(const void *pkey, const struct net_device *dev, _
>  		(p32[3] * hash_rnd[3]));
>  }
>  
> -static inline struct neighbour *__ipv6_neigh_lookup(struct neigh_table *tbl, struct net_device *dev, const void *pkey)
> +static inline
> +struct neighbour *__ipv6_neigh_lookup_noref(struct neigh_table *tbl,
> +					    struct net_device *dev,
> +					    const void *pkey)
>  {
> 

__ipv6_neigh_lookup_noref() should be moved to net/ipv6/route.c


> @@ -124,9 +125,17 @@ static int ip6_finish_output2(struct sk_buff *skb)
>  	}
>  
>  	rt = (struct rt6_info *) dst;
> -	neigh = rt->n;
> -	if (neigh)
> +	rcu_read_lock_bh();
> +	if (rt->rt6i_flags & RTF_GATEWAY)
> +		daddr = &rt->rt6i_gateway;
> +	neigh = __ipv6_neigh_lookup_noref(&nd_tbl, dev, daddr);
> +	if (unlikely(!neigh))
> +		neigh = __neigh_create(&nd_tbl, daddr, dev, false);
> +	if (!IS_ERR(neigh)) {
> +		rcu_read_unlock_bh();

You release the lock too soon. Take a look at IPv4 ...

>  		return dst_neigh_output(dst, neigh, skb);
> +	}
> +	rcu_read_unlock_bh();
>  
>  	IP6_INC_STATS_BH(dev_net(dst->dev),
>  			 ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);

And really, you should not post patches like this without testing them,
as testing is probably the hardest part of the equation.

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC Patch net-next] ipv6: look up neighbours on demand in ip6_finish_output2()
  2013-01-14 13:35 [RFC Patch net-next] ipv6: look up neighbours on demand in ip6_finish_output2() Cong Wang
  2013-01-14 16:29 ` Eric Dumazet
@ 2013-01-14 18:30 ` David Miller
  2013-01-15  2:28   ` Cong Wang
  1 sibling, 1 reply; 5+ messages in thread
From: David Miller @ 2013-01-14 18:30 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, roland


This is a very incomplete patch.

If this change were so simple, it would have been done by someone
else a long time ago.

You must, in addition to the incredibly obvious changes in the packet
output path, completely eliminate the caching of the neighbour entry
in the ipv6 routes themselves.

This means replacing every rt6->n access or test with something
equivalent.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC Patch net-next] ipv6: look up neighbours on demand in ip6_finish_output2()
  2013-01-14 18:30 ` David Miller
@ 2013-01-15  2:28   ` Cong Wang
  2013-01-15 12:22     ` YOSHIFUJI Hideaki
  0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2013-01-15  2:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, roland, Eric Dumazet

On 01/15/2013 02:30 AM, David Miller wrote:
>
> This is a very incomplete patch.
>
> If this change were so simple, it would have been done by someone
> else a long time ago.
>
> You must, in addition to the incredibly obvious changes in the packet
> output path, completely eliminate the caching of the neighbour entry
> in the ipv6 routes themselves.
>
> This means replacing every rt6->n access or test with something
> equivalent.
>

THanks, David and Eric!

I knew this is probably incomplete, that is why I added "RFC". 
Fortunately, YOSHIFUJI just sent a more complete patch:

[RFC net-next] ipv6 route: Do not attach neighbour on route.

So, please ignore mine and use his. :)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC Patch net-next] ipv6: look up neighbours on demand in ip6_finish_output2()
  2013-01-15  2:28   ` Cong Wang
@ 2013-01-15 12:22     ` YOSHIFUJI Hideaki
  0 siblings, 0 replies; 5+ messages in thread
From: YOSHIFUJI Hideaki @ 2013-01-15 12:22 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, netdev, roland, Eric Dumazet, YOSHIFUJI Hideaki

Cong Wang wrote:
> On 01/15/2013 02:30 AM, David Miller wrote:
>>
>> This is a very incomplete patch.
>>
>> If this change were so simple, it would have been done by someone
>> else a long time ago.
>>
>> You must, in addition to the incredibly obvious changes in the packet
>> output path, completely eliminate the caching of the neighbour entry
>> in the ipv6 routes themselves.
>>
>> This means replacing every rt6->n access or test with something
>> equivalent.
>>
> 
> THanks, David and Eric!
> 
> I knew this is probably incomplete, that is why I added "RFC". Fortunately, YOSHIFUJI just sent a more complete patch:
> 
> [RFC net-next] ipv6 route: Do not attach neighbour on route.
> 
> So, please ignore mine and use his. :)

Well, in fact, it did't work.  Tt really needs further work.

--yoshfuji

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-01-15 12:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-14 13:35 [RFC Patch net-next] ipv6: look up neighbours on demand in ip6_finish_output2() Cong Wang
2013-01-14 16:29 ` Eric Dumazet
2013-01-14 18:30 ` David Miller
2013-01-15  2:28   ` Cong Wang
2013-01-15 12:22     ` YOSHIFUJI Hideaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).