From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Ricardo Leitner Subject: Re: [patch net] ipv6: do not create neighbor entries for local delivery Date: Thu, 08 Aug 2013 17:46:41 -0300 Message-ID: <520403B1.9010000@redhat.com> References: <20130130082608.GA1604@minipsycho.orion> <20130808194702.GH14001@order.stressinduktion.org> <20130808201627.GI14001@order.stressinduktion.org> <52040361.5020200@redhat.com> Reply-To: mleitner@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Jiri Pirko , "davem@davemloft.net" , "netdev@vger.kernel.org" , Alexey Kuznetsov , "jmorris@namei.org" , "yoshfuji@linux-ipv6.org" , Patrick McHardy , "Banerjee, Debabrata" , Joshua Hunt To: Debabrata Banerjee Return-path: Received: from mx1.redhat.com ([209.132.183.28]:28044 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966479Ab3HHUrV (ORCPT ); Thu, 8 Aug 2013 16:47:21 -0400 In-Reply-To: <52040361.5020200@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: Em 08-08-2013 17:45, Marcelo Ricardo Leitner escreveu: > Em 08-08-2013 17:16, Hannes Frederic Sowa escreveu: >> On Thu, Aug 08, 2013 at 09:47:02PM +0200, Hannes Frederic Sowa wrote: >>> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote: >>>> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko wrote: >>>>> From: Marcelo Ricardo Leitner >>>>> >>>>> They will be created at output, if ever needed. This avoids creating >>>>> empty neighbor entries when TPROXYing/Forwarding packets for addresses >>>>> that are not even directly reachable. >>>>> >>>>> Note that IPv4 already handles it this way. No neighbor entries are >>>>> created for local input. >>>>> >>>>> Tested by myself and customer. >>>>> >>>>> Signed-off-by: Jiri Pirko >>>>> Signed-off-by: Marcelo Ricardo Leitner >>>>> --- >>>>> net/ipv6/route.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c >>>>> index e229a3b..363d8b7 100644 >>>>> --- a/net/ipv6/route.c >>>>> +++ b/net/ipv6/route.c >>>>> @@ -928,7 +928,7 @@ restart: >>>>> dst_hold(&rt->dst); >>>>> read_unlock_bh(&table->tb6_lock); >>>>> >>>>> - if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP)) >>>>> + if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL))) >>>>> nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr); >>>>> else if (!(rt->dst.flags & DST_HOST)) >>>>> nrt = rt6_alloc_clone(rt, &fl6->daddr); >>>> >>>> >>>> >>>> I'm not sure this patch is doing the right thing. It seems to break >>>> IPv6 loopback functionality, it is no longer equivalent to IPv4, as >>>> stated above. It doesn't just stop neighbor creation but it stops >>>> cached route creation. Seems like a scary change for a stable tree. >>>> See below: >>>> >>>> $ ip -4 route show local >>>> local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 >>>> >>>> This local route enables us to use the whole loopback network, any >>>> address inside 127.0.0.0/8 will work. >>>> >>>> $ ping -c1 127.0.0.9 >>>> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data. >>>> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms >>>> >>>> --- 127.0.0.9 ping statistics --- >>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms >>>> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms >>>> >>>> This also used to work equivalently for IPv6 local loopback routes: >>>> >>>> $ ip -6 route add local 2001:::/64 dev lo >>>> $ ping6 -c1 2001::9 >>>> PING 2001::9(2001::9) 56 data bytes >>>> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms >>>> >>>> --- 2001::9 ping statistics --- >>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms >>>> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms >>>> >>>> However with this patch, this is very broken: >>>> >>>> $ ip -6 route add local 2001::/64 dev lo >>>> $ ping6 -c1 2001::9 >>>> PING 2001::9(2001::9) 56 data bytes >>>> ping: sendmsg: Invalid argument >>> >>> I do think that the patch above is fine. I wonder why you get a blackhole >>> route back here. Maybe backtracking in ip6_pol_route or in fib6_lookup_1 was >>> way too aggressive? >> >> Ah sorry, before rt->n removal everything worked a bit >> different. rt6_alloc_cow did fill rt->n back then. To fix both things >> we would have to bind a neighbour towards the loopback interface into >> the non-cloned rt6_info if it feeds packets towards lo. Pretty big change for >> old stable kernels, I guess. :/ >> >> Marcelo, any idea how to deal with this? My guess would be a revert, but I >> don't know the impact on the tproxy issue. > > Good question :) Nothing so far, sorry. > > The impact would be returning to the previous state, that a tproxy server is > limited to neighbor cache size. And just making it larger is not a good option > as it will introduce big latency spikes during cleanup. > > I'll have to rebuild the tproxy environment I had to test this out again, it > will take a while. Keep you posted. Aye, and thanks for assisting on this, Hannes, appreciated. Cheers, Marcelo