From: Marcelo Ricardo Leitner <mleitner@redhat.com>
To: Debabrata Banerjee <dbavatar@gmail.com>,
Jiri Pirko <jiri@resnulli.us>,
"davem@davemloft.net" <davem@davemloft.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
"jmorris@namei.org" <jmorris@namei.org>,
"yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>,
Patrick McHardy <kaber@trash.net>,
"Banerjee, Debabrata" <dbanerje@akamai.com>,
Joshua Hunt <johunt@akamai.com>
Subject: Re: [patch net] ipv6: do not create neighbor entries for local delivery
Date: Thu, 08 Aug 2013 17:45:21 -0300 [thread overview]
Message-ID: <52040361.5020200@redhat.com> (raw)
In-Reply-To: <20130808201627.GI14001@order.stressinduktion.org>
Em 08-08-2013 17:16, Hannes Frederic Sowa escreveu:
> On Thu, Aug 08, 2013 at 09:47:02PM +0200, Hannes Frederic Sowa wrote:
>> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
>>> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>>
>>>> They will be created at output, if ever needed. This avoids creating
>>>> empty neighbor entries when TPROXYing/Forwarding packets for addresses
>>>> that are not even directly reachable.
>>>>
>>>> Note that IPv4 already handles it this way. No neighbor entries are
>>>> created for local input.
>>>>
>>>> Tested by myself and customer.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>> ---
>>>> net/ipv6/route.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>>> index e229a3b..363d8b7 100644
>>>> --- a/net/ipv6/route.c
>>>> +++ b/net/ipv6/route.c
>>>> @@ -928,7 +928,7 @@ restart:
>>>> dst_hold(&rt->dst);
>>>> read_unlock_bh(&table->tb6_lock);
>>>>
>>>> - if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
>>>> + if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
>>>> nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
>>>> else if (!(rt->dst.flags & DST_HOST))
>>>> nrt = rt6_alloc_clone(rt, &fl6->daddr);
>>>
>>>
>>>
>>> I'm not sure this patch is doing the right thing. It seems to break
>>> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
>>> stated above. It doesn't just stop neighbor creation but it stops
>>> cached route creation. Seems like a scary change for a stable tree.
>>> See below:
>>>
>>> $ ip -4 route show local
>>> local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
>>>
>>> This local route enables us to use the whole loopback network, any
>>> address inside 127.0.0.0/8 will work.
>>>
>>> $ ping -c1 127.0.0.9
>>> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
>>> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
>>>
>>> --- 127.0.0.9 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
>>>
>>> This also used to work equivalently for IPv6 local loopback routes:
>>>
>>> $ ip -6 route add local 2001:::/64 dev lo
>>> $ ping6 -c1 2001::9
>>> PING 2001::9(2001::9) 56 data bytes
>>> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
>>>
>>> --- 2001::9 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
>>>
>>> However with this patch, this is very broken:
>>>
>>> $ ip -6 route add local 2001::/64 dev lo
>>> $ ping6 -c1 2001::9
>>> PING 2001::9(2001::9) 56 data bytes
>>> ping: sendmsg: Invalid argument
>>
>> I do think that the patch above is fine. I wonder why you get a blackhole
>> route back here. Maybe backtracking in ip6_pol_route or in fib6_lookup_1 was
>> way too aggressive?
>
> Ah sorry, before rt->n removal everything worked a bit
> different. rt6_alloc_cow did fill rt->n back then. To fix both things
> we would have to bind a neighbour towards the loopback interface into
> the non-cloned rt6_info if it feeds packets towards lo. Pretty big change for
> old stable kernels, I guess. :/
>
> Marcelo, any idea how to deal with this? My guess would be a revert, but I
> don't know the impact on the tproxy issue.
Good question :) Nothing so far, sorry.
The impact would be returning to the previous state, that a tproxy server is
limited to neighbor cache size. And just making it larger is not a good option
as it will introduce big latency spikes during cleanup.
I'll have to rebuild the tproxy environment I had to test this out again, it
will take a while. Keep you posted.
Cheers,
Marcelo
next prev parent reply other threads:[~2013-08-08 20:46 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-30 8:26 [patch net] ipv6: do not create neighbor entries for local delivery Jiri Pirko
2013-01-31 1:26 ` David Miller
2013-08-08 18:45 ` Debabrata Banerjee
2013-08-08 19:01 ` Hannes Frederic Sowa
2013-08-08 19:02 ` Marcelo Ricardo Leitner
2013-08-08 19:06 ` Hannes Frederic Sowa
2013-08-08 19:11 ` Marcelo Ricardo Leitner
2013-08-08 19:16 ` Hannes Frederic Sowa
2013-08-08 19:23 ` Marcelo Ricardo Leitner
2013-08-08 19:19 ` Debabrata Banerjee
2013-08-08 19:47 ` Hannes Frederic Sowa
2013-08-08 20:16 ` Hannes Frederic Sowa
2013-08-08 20:45 ` Marcelo Ricardo Leitner [this message]
2013-08-08 20:46 ` Marcelo Ricardo Leitner
2013-08-12 18:09 ` Marcelo Ricardo Leitner
2013-08-12 22:26 ` Hannes Frederic Sowa
2013-08-13 12:48 ` Marcelo Ricardo Leitner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52040361.5020200@redhat.com \
--to=mleitner@redhat.com \
--cc=davem@davemloft.net \
--cc=dbanerje@akamai.com \
--cc=dbavatar@gmail.com \
--cc=jiri@resnulli.us \
--cc=jmorris@namei.org \
--cc=johunt@akamai.com \
--cc=kaber@trash.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=netdev@vger.kernel.org \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).