From: Marcelo Ricardo Leitner <mleitner@redhat.com>
To: Debabrata Banerjee <dbavatar@gmail.com>,
Jiri Pirko <jiri@resnulli.us>,
"davem@davemloft.net" <davem@davemloft.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
"jmorris@namei.org" <jmorris@namei.org>,
"yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>,
Patrick McHardy <kaber@trash.net>,
"Banerjee, Debabrata" <dbanerje@akamai.com>,
Joshua Hunt <johunt@akamai.com>
Subject: Re: [patch net] ipv6: do not create neighbor entries for local delivery
Date: Mon, 12 Aug 2013 15:09:19 -0300 [thread overview]
Message-ID: <520924CF.6000805@redhat.com> (raw)
In-Reply-To: <20130808201627.GI14001@order.stressinduktion.org>
[-- Attachment #1: Type: text/plain, Size: 4060 bytes --]
Em 08-08-2013 17:16, Hannes Frederic Sowa escreveu:
> On Thu, Aug 08, 2013 at 09:47:02PM +0200, Hannes Frederic Sowa wrote:
>> On Thu, Aug 08, 2013 at 02:45:40PM -0400, Debabrata Banerjee wrote:
>>> On Wed, Jan 30, 2013 at 3:26 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>>
>>>> They will be created at output, if ever needed. This avoids creating
>>>> empty neighbor entries when TPROXYing/Forwarding packets for addresses
>>>> that are not even directly reachable.
>>>>
>>>> Note that IPv4 already handles it this way. No neighbor entries are
>>>> created for local input.
>>>>
>>>> Tested by myself and customer.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>> ---
>>>> net/ipv6/route.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>>> index e229a3b..363d8b7 100644
>>>> --- a/net/ipv6/route.c
>>>> +++ b/net/ipv6/route.c
>>>> @@ -928,7 +928,7 @@ restart:
>>>> dst_hold(&rt->dst);
>>>> read_unlock_bh(&table->tb6_lock);
>>>>
>>>> - if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
>>>> + if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
>>>> nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
>>>> else if (!(rt->dst.flags & DST_HOST))
>>>> nrt = rt6_alloc_clone(rt, &fl6->daddr);
>>>
>>>
>>>
>>> I'm not sure this patch is doing the right thing. It seems to break
>>> IPv6 loopback functionality, it is no longer equivalent to IPv4, as
>>> stated above. It doesn't just stop neighbor creation but it stops
>>> cached route creation. Seems like a scary change for a stable tree.
>>> See below:
>>>
>>> $ ip -4 route show local
>>> local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
>>>
>>> This local route enables us to use the whole loopback network, any
>>> address inside 127.0.0.0/8 will work.
>>>
>>> $ ping -c1 127.0.0.9
>>> PING 127.0.0.9 (127.0.0.9) 56(84) bytes of data.
>>> 64 bytes from 127.0.0.9: icmp_seq=1 ttl=64 time=0.012 ms
>>>
>>> --- 127.0.0.9 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms
>>>
>>> This also used to work equivalently for IPv6 local loopback routes:
>>>
>>> $ ip -6 route add local 2001:::/64 dev lo
>>> $ ping6 -c1 2001::9
>>> PING 2001::9(2001::9) 56 data bytes
>>> 64 bytes from 2001::9: icmp_seq=1 ttl=64 time=0.010 ms
>>>
>>> --- 2001::9 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms
>>>
>>> However with this patch, this is very broken:
>>>
>>> $ ip -6 route add local 2001::/64 dev lo
>>> $ ping6 -c1 2001::9
>>> PING 2001::9(2001::9) 56 data bytes
>>> ping: sendmsg: Invalid argument
>>
>> I do think that the patch above is fine. I wonder why you get a blackhole
>> route back here. Maybe backtracking in ip6_pol_route or in fib6_lookup_1 was
>> way too aggressive?
>
> Ah sorry, before rt->n removal everything worked a bit
> different. rt6_alloc_cow did fill rt->n back then. To fix both things
> we would have to bind a neighbour towards the loopback interface into
> the non-cloned rt6_info if it feeds packets towards lo. Pretty big change for
> old stable kernels, I guess. :/
>
> Marcelo, any idea how to deal with this? My guess would be a revert, but I
> don't know the impact on the tproxy issue.
Hannes, would something like this be acceptable? I'm hoping it's not too
ugly/hacky... as far as I could track back, input and output routines were
merged mainly due code similarity.
TPROXY scenario needs to not create this neighbor entries on INPUT path, while
Debabrata ping test needs it on OUTPUT path. This patch limits my previous
patch to INPUT only then.
Initial testing here seems good, TPROXY seems to be working as expected and
also the ping6 test.
What do you think?
Regards,
Marcelo
[-- Attachment #2: ipv6-rt.patch --]
[-- Type: text/x-patch, Size: 1914 bytes --]
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 18ea73c..603f9d9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -791,7 +791,7 @@ static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort,
}
static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, int oif,
- struct flowi6 *fl6, int flags)
+ struct flowi6 *fl6, int flags, int output)
{
struct fib6_node *fn;
struct rt6_info *rt, *nrt;
@@ -799,8 +799,11 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
int attempts = 3;
int err;
int reachable = net->ipv6.devconf_all->forwarding ? 0 : RT6_LOOKUP_F_REACHABLE;
+ int local = RTF_NONEXTHOP;
strict |= flags & RT6_LOOKUP_F_IFACE;
+ if (!output)
+ local |= RTF_LOCAL;
relookup:
read_lock_bh(&table->tb6_lock);
@@ -820,7 +823,7 @@ restart:
read_unlock_bh(&table->tb6_lock);
if (!dst_get_neighbour_raw(&rt->dst)
- && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
+ && !(rt->rt6i_flags & local))
nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
else if (!(rt->dst.flags & DST_HOST))
nrt = rt6_alloc_clone(rt, &fl6->daddr);
@@ -864,7 +867,7 @@ out2:
static struct rt6_info *ip6_pol_route_input(struct net *net, struct fib6_table *table,
struct flowi6 *fl6, int flags)
{
- return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags);
+ return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags, 0);
}
void ip6_route_input(struct sk_buff *skb)
@@ -890,7 +893,7 @@ void ip6_route_input(struct sk_buff *skb)
static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table *table,
struct flowi6 *fl6, int flags)
{
- return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
+ return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags, 1);
}
struct dst_entry * ip6_route_output(struct net *net, const struct sock *sk,
next prev parent reply other threads:[~2013-08-12 18:10 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-30 8:26 [patch net] ipv6: do not create neighbor entries for local delivery Jiri Pirko
2013-01-31 1:26 ` David Miller
2013-08-08 18:45 ` Debabrata Banerjee
2013-08-08 19:01 ` Hannes Frederic Sowa
2013-08-08 19:02 ` Marcelo Ricardo Leitner
2013-08-08 19:06 ` Hannes Frederic Sowa
2013-08-08 19:11 ` Marcelo Ricardo Leitner
2013-08-08 19:16 ` Hannes Frederic Sowa
2013-08-08 19:23 ` Marcelo Ricardo Leitner
2013-08-08 19:19 ` Debabrata Banerjee
2013-08-08 19:47 ` Hannes Frederic Sowa
2013-08-08 20:16 ` Hannes Frederic Sowa
2013-08-08 20:45 ` Marcelo Ricardo Leitner
2013-08-08 20:46 ` Marcelo Ricardo Leitner
2013-08-12 18:09 ` Marcelo Ricardo Leitner [this message]
2013-08-12 22:26 ` Hannes Frederic Sowa
2013-08-13 12:48 ` Marcelo Ricardo Leitner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=520924CF.6000805@redhat.com \
--to=mleitner@redhat.com \
--cc=davem@davemloft.net \
--cc=dbanerje@akamai.com \
--cc=dbavatar@gmail.com \
--cc=jiri@resnulli.us \
--cc=jmorris@namei.org \
--cc=johunt@akamai.com \
--cc=kaber@trash.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=netdev@vger.kernel.org \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).