netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Ricardo Leitner <mleitner@redhat.com>
To: Debabrata Banerjee <dbavatar@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	"jmorris@namei.org" <jmorris@namei.org>,
	"yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>,
	Patrick McHardy <kaber@trash.net>,
	"Banerjee, Debabrata" <dbanerje@akamai.com>,
	Joshua Hunt <johunt@akamai.com>
Subject: Re: [patch net] ipv6: do not create neighbor entries for local delivery
Date: Thu, 08 Aug 2013 17:45:21 -0300	[thread overview]
Message-ID: <52040361.5020200@redhat.com> (raw)
In-Reply-To: <20130808201627.GI14001@order.stressinduktion.org>

Em 08-08-2013 17:16, Hannes Frederic Sowa escreveu:
> On Thu, Aug 08, 2013 at 09:47:02PM +0200, Hannes Frederic Sowa wrote:
>> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
>>> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>>
>>>> They will be created at output, if ever needed. This avoids creating
>>>> empty neighbor entries when TPROXYing/Forwarding packets for addresses
>>>> that are not even directly reachable.
>>>>
>>>> Note that IPv4 already handles it this way. No neighbor entries are
>>>> created for local input.
>>>>
>>>> Tested by myself and customer.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>> ---
>>>>   net/ipv6/route.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>>> index e229a3b..363d8b7 100644
>>>> --- a/net/ipv6/route.c
>>>> +++ b/net/ipv6/route.c
>>>> @@ -928,7 +928,7 @@ restart:
>>>>          dst_hold(&rt->dst);
>>>>          read_unlock_bh(&table->tb6_lock);
>>>>
>>>> -       if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
>>>> +       if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
>>>>                  nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
>>>>          else if (!(rt->dst.flags & DST_HOST))
>>>>                  nrt = rt6_alloc_clone(rt, &fl6->daddr);
>>>
>>>
>>>
>>> I'm not sure this patch is doing the right thing. It seems to break
>>> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
>>> stated above. It doesn't just stop neighbor creation but it stops
>>> cached route creation. Seems like a scary change for a stable tree.
>>> See below:
>>>
>>> $ ip -4 route show local
>>> local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
>>>
>>> This local route enables us to use the whole loopback network, any
>>> address inside 127.0.0.0/8 will work.
>>>
>>> $ ping -c1 127.0.0.9
>>> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
>>> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
>>>
>>> --- 127.0.0.9 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
>>>
>>> This also used to work equivalently for IPv6 local loopback routes:
>>>
>>> $ ip -6 route add local 2001:::/64 dev lo
>>> $ ping6 -c1 2001::9
>>> PING 2001::9(2001::9) 56 data bytes
>>> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
>>>
>>> --- 2001::9 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
>>>
>>> However with this patch, this is very broken:
>>>
>>> $ ip -6 route add local 2001::/64 dev lo
>>> $ ping6 -c1 2001::9
>>> PING 2001::9(2001::9) 56 data bytes
>>> ping: sendmsg: Invalid argument
>>
>> I do think that the patch above is fine. I wonder why you get a blackhole
>> route back here. Maybe backtracking in ip6_pol_route or in fib6_lookup_1 was
>> way too aggressive?
>
> Ah sorry, before rt->n removal everything worked a bit
> different. rt6_alloc_cow did fill rt->n back then. To fix both things
> we would have to bind a neighbour towards the loopback interface into
> the non-cloned rt6_info if it feeds packets towards lo. Pretty big change for
> old stable kernels, I guess. :/
>
> Marcelo, any idea how to deal with this? My guess would be a revert, but I
> don't know the impact on the tproxy issue.

Good question :) Nothing so far, sorry.

The impact would be returning to the previous state, that a tproxy server is 
limited to neighbor cache size. And just making it larger is not a good option 
as it will introduce big latency spikes during cleanup.

I'll have to rebuild the tproxy environment I had to test this out again, it 
will take a while. Keep you posted.

Cheers,
Marcelo

  reply	other threads:[~2013-08-08 20:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-30  8:26 [patch net] ipv6: do not create neighbor entries for local delivery Jiri Pirko
2013-01-31  1:26 ` David Miller
2013-08-08 18:45 ` Debabrata Banerjee
2013-08-08 19:01   ` Hannes Frederic Sowa
2013-08-08 19:02     ` Marcelo Ricardo Leitner
2013-08-08 19:06       ` Hannes Frederic Sowa
2013-08-08 19:11         ` Marcelo Ricardo Leitner
2013-08-08 19:16           ` Hannes Frederic Sowa
2013-08-08 19:23             ` Marcelo Ricardo Leitner
2013-08-08 19:19     ` Debabrata Banerjee
2013-08-08 19:47   ` Hannes Frederic Sowa
2013-08-08 20:16     ` Hannes Frederic Sowa
2013-08-08 20:45       ` Marcelo Ricardo Leitner [this message]
2013-08-08 20:46         ` Marcelo Ricardo Leitner
2013-08-12 18:09       ` Marcelo Ricardo Leitner
2013-08-12 22:26         ` Hannes Frederic Sowa
2013-08-13 12:48           ` Marcelo Ricardo Leitner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52040361.5020200@redhat.com \
    --to=mleitner@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dbanerje@akamai.com \
    --cc=dbavatar@gmail.com \
    --cc=jiri@resnulli.us \
    --cc=jmorris@namei.org \
    --cc=johunt@akamai.com \
    --cc=kaber@trash.net \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=netdev@vger.kernel.org \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).