netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
@ 2015-10-05 14:32 David Ahern
  2015-10-07 11:25 ` David Miller
  2015-10-09  6:54 ` Hajime Tazaki
  0 siblings, 2 replies; 17+ messages in thread
From: David Ahern @ 2015-10-05 14:32 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, David Ahern

It occurred to me yesterday that 741a11d9e4103 ("net: ipv6: Add
RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup
needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes
the oif to be considered in lookups which is known to break vti. This
explains why 58189ca7b274 did not the IPv6 change at the time it was
submitted.

Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
This is needed in 4.3.

 net/ipv6/xfrm6_policy.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 30caa289c5db..5cedfda4b241 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif,
 
 	memset(&fl6, 0, sizeof(fl6));
 	fl6.flowi6_oif = oif;
+	fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
 	memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr));
 	if (saddr)
 		memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-05 14:32 [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6 David Ahern
@ 2015-10-07 11:25 ` David Miller
  2015-10-07 14:12   ` David Ahern
  2015-10-09  6:54 ` Hajime Tazaki
  1 sibling, 1 reply; 17+ messages in thread
From: David Miller @ 2015-10-07 11:25 UTC (permalink / raw)
  To: dsa; +Cc: netdev, steffen.klassert

From: David Ahern <dsa@cumulusnetworks.com>
Date: Mon,  5 Oct 2015 08:32:51 -0600

> It occurred to me yesterday that 741a11d9e4103 ("net: ipv6: Add
> RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup
> needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes
> the oif to be considered in lookups which is known to break vti. This
> explains why 58189ca7b274 did not the IPv6 change at the time it was
> submitted.
> 
> Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

If this is needed in v4.3 why are you targetting net-next?

I applied it to both trees...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-07 11:25 ` David Miller
@ 2015-10-07 14:12   ` David Ahern
  0 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2015-10-07 14:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, steffen.klassert

On 10/7/15 5:25 AM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Mon,  5 Oct 2015 08:32:51 -0600
>
>> It occurred to me yesterday that 741a11d9e4103 ("net: ipv6: Add
>> RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup
>> needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes
>> the oif to be considered in lookups which is known to break vti. This
>> explains why 58189ca7b274 did not the IPv6 change at the time it was
>> submitted.
>>
>> Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
>
> If this is needed in v4.3 why are you targetting net-next?

default git setting. Missed removal before send.

>
> I applied it to both trees...
>

thank you.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-05 14:32 [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6 David Ahern
  2015-10-07 11:25 ` David Miller
@ 2015-10-09  6:54 ` Hajime Tazaki
  2015-10-09  7:17   ` Steffen Klassert
  1 sibling, 1 reply; 17+ messages in thread
From: Hajime Tazaki @ 2015-10-09  6:54 UTC (permalink / raw)
  To: dsa; +Cc: netdev, steffen.klassert


Hello David,

At Mon,  5 Oct 2015 08:32:51 -0600,
David Ahern wrote:

> 
> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> index 30caa289c5db..5cedfda4b241 100644
> --- a/net/ipv6/xfrm6_policy.c
> +++ b/net/ipv6/xfrm6_policy.c
> @@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif,
>  
>  	memset(&fl6, 0, sizeof(fl6));
>  	fl6.flowi6_oif = oif;
> +	fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
>  	memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr));
>  	if (saddr)
>  		memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr));

I found that this fix is still not sufficient with the mip6
(Mobile IPv6) use case.

FLOWI_FLAG_SKIP_NH_OIF is not checked anywhere else in ipv6
code, in ip6_route_output() etc.

Even if I added the check (like below), MH packets are not
sent at all from mobile node, home agent.

do you have any idea ?

I have a reproducible setup here with mip6. let me know if
you need further information.


diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8c0898796ffb..0aba308b5ea3 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1169,9 +1169,9 @@ struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk,
 
        fl6->flowi6_iif = LOOPBACK_IFINDEX;
 
-       if ((sk && sk->sk_bound_dev_if) || rt6_need_strict(&fl6->daddr))
+       if ((sk && sk->sk_bound_dev_if) || rt6_need_strict(&fl6->daddr) ||
+           (!(fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF) && fl6->flowi6_oif))
                flags |= RT6_LOOKUP_F_IFACE;

        if (!ipv6_addr_any(&fl6->saddr))
                flags |= RT6_LOOKUP_F_HAS_SADDR;
        else if (sk)

-- Hajime

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-09  6:54 ` Hajime Tazaki
@ 2015-10-09  7:17   ` Steffen Klassert
  2015-10-09 15:53     ` David Ahern
  2015-10-09 17:27     ` David Ahern
  0 siblings, 2 replies; 17+ messages in thread
From: Steffen Klassert @ 2015-10-09  7:17 UTC (permalink / raw)
  To: Hajime Tazaki; +Cc: dsa, netdev

On Fri, Oct 09, 2015 at 03:54:22PM +0900, Hajime Tazaki wrote:
> 
> Hello David,
> 
> At Mon,  5 Oct 2015 08:32:51 -0600,
> David Ahern wrote:
> 
> > 
> > diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> > index 30caa289c5db..5cedfda4b241 100644
> > --- a/net/ipv6/xfrm6_policy.c
> > +++ b/net/ipv6/xfrm6_policy.c
> > @@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif,
> >  
> >  	memset(&fl6, 0, sizeof(fl6));
> >  	fl6.flowi6_oif = oif;
> > +	fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
> >  	memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr));
> >  	if (saddr)
> >  		memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr));
> 
> I found that this fix is still not sufficient with the mip6
> (Mobile IPv6) use case.

It does not even fix the vti case. The behaviour of the vti devices is
the same, with and without the patch.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-09  7:17   ` Steffen Klassert
@ 2015-10-09 15:53     ` David Ahern
  2015-10-09 17:27     ` David Ahern
  1 sibling, 0 replies; 17+ messages in thread
From: David Ahern @ 2015-10-09 15:53 UTC (permalink / raw)
  To: Steffen Klassert, Hajime Tazaki; +Cc: netdev

On 10/9/15 1:17 AM, Steffen Klassert wrote:
> On Fri, Oct 09, 2015 at 03:54:22PM +0900, Hajime Tazaki wrote:
>>
>> Hello David,
>>
>> At Mon,  5 Oct 2015 08:32:51 -0600,
>> David Ahern wrote:
>>
>>>
>>> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
>>> index 30caa289c5db..5cedfda4b241 100644
>>> --- a/net/ipv6/xfrm6_policy.c
>>> +++ b/net/ipv6/xfrm6_policy.c
>>> @@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif,
>>>
>>>   	memset(&fl6, 0, sizeof(fl6));
>>>   	fl6.flowi6_oif = oif;
>>> +	fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
>>>   	memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr));
>>>   	if (saddr)
>>>   		memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr));
>>
>> I found that this fix is still not sufficient with the mip6
>> (Mobile IPv6) use case.
>
> It does not even fix the vti case. The behaviour of the vti devices is
> the same, with and without the patch.
>

I goofed this patch was on top of my IPv6 VRF patches. You need the 
FLOWI_FLAG_SKIP_NH_OIF bits from:

     http://www.spinics.net/lists/netdev/msg346860.html

Let me cook up a patch based on Linus' tree which is where it is needed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-09  7:17   ` Steffen Klassert
  2015-10-09 15:53     ` David Ahern
@ 2015-10-09 17:27     ` David Ahern
  2015-10-11 13:22       ` Hajime Tazaki
  2015-10-12 18:49       ` David Ahern
  1 sibling, 2 replies; 17+ messages in thread
From: David Ahern @ 2015-10-09 17:27 UTC (permalink / raw)
  To: Steffen Klassert, Hajime Tazaki; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]

On 10/9/15 1:17 AM, Steffen Klassert wrote:
>>> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
>>> index 30caa289c5db..5cedfda4b241 100644
>>> --- a/net/ipv6/xfrm6_policy.c
>>> +++ b/net/ipv6/xfrm6_policy.c
>>> @@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif,
>>>
>>>   	memset(&fl6, 0, sizeof(fl6));
>>>   	fl6.flowi6_oif = oif;
>>> +	fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
>>>   	memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr));
>>>   	if (saddr)
>>>   		memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr));
>>
>> I found that this fix is still not sufficient with the mip6
>> (Mobile IPv6) use case.
>
> It does not even fix the vti case. The behaviour of the vti devices is
> the same, with and without the patch.
>

The attached patch applied to Linus' tree works for me. Currently the 
above change is not in his tree, so I added it to this patch. Once you 
confirm that it works for you I'll create the delta-patch for net and 
send out.

Thanks,
David

[-- Attachment #2: really-fix-vti6-with-oif.patch --]
[-- Type: text/plain, Size: 1506 bytes --]

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 92b1aa38f121..2dbd73014a1b 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -874,7 +874,8 @@ static struct dst_entry *ip6_sk_dst_check(struct sock *sk,
 #ifdef CONFIG_IPV6_SUBTREES
 	    ip6_rt_check(&rt->rt6i_src, &fl6->saddr, np->saddr_cache) ||
 #endif
-	    (fl6->flowi6_oif && fl6->flowi6_oif != dst->dev->ifindex)) {
+	   (!(fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF) &&
+	      (fl6->flowi6_oif && fl6->flowi6_oif != dst->dev->ifindex))) {
 		dst_release(dst);
 		dst = NULL;
 	}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index cb32ce250db0..df24cff4a0cb 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1068,6 +1068,9 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 	fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
 	saved_fn = fn;
 
+	if (fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF)
+		oif = 0;
+
 redo_rt6_select:
 	rt = rt6_select(fn, oif, strict);
 	if (rt->rt6i_nsiblings)
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 30caa289c5db..5cedfda4b241 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif,
 
 	memset(&fl6, 0, sizeof(fl6));
 	fl6.flowi6_oif = oif;
+	fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
 	memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr));
 	if (saddr)
 		memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr));

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-09 17:27     ` David Ahern
@ 2015-10-11 13:22       ` Hajime Tazaki
  2015-10-11 13:31         ` David Ahern
  2015-10-12 18:49       ` David Ahern
  1 sibling, 1 reply; 17+ messages in thread
From: Hajime Tazaki @ 2015-10-11 13:22 UTC (permalink / raw)
  To: dsa; +Cc: steffen.klassert, netdev


At Fri, 9 Oct 2015 11:27:36 -0600,
David Ahern wrote:
> 
> [1  <text/plain; windows-1252 (7bit)>]
> On 10/9/15 1:17 AM, Steffen Klassert wrote:
> >>> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> >>> index 30caa289c5db..5cedfda4b241 100644
> >>> --- a/net/ipv6/xfrm6_policy.c
> >>> +++ b/net/ipv6/xfrm6_policy.c
> >>> @@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif,
> >>>
> >>>   	memset(&fl6, 0, sizeof(fl6));
> >>>   	fl6.flowi6_oif = oif;
> >>> +	fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
> >>>   	memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr));
> >>>   	if (saddr)
> >>>   		memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr));
> >>
> >> I found that this fix is still not sufficient with the mip6
> >> (Mobile IPv6) use case.
> >
> > It does not even fix the vti case. The behaviour of the vti devices is
> > the same, with and without the patch.
> >
> 
> The attached patch applied to Linus' tree works for me. Currently the 
> above change is not in his tree, so I added it to this patch. Once you 
> confirm that it works for you I'll create the delta-patch for net and 
> send out.

I gave it a try but without any luck unfortunately.
I may need to look carefully what mip6 does here.

the code path where I'm looking at for MH packet (raw socket
with IPPROTO_MH) is:

#0  ip6_route_output (net=0x7ffff3a40a40 <rumpns_init_net>, sk=0x7ffff30b5c50, fl6=0x7ffff0ed2fe0) at net/ipv6/route.c:1195
#1  0x00007ffff35d155f in ip6_dst_lookup_tail (net=0x7ffff3a40a40 <rumpns_init_net>, sk=0x7ffff30b5c50, dst=0x7ffff0ed2f18, fl6=0x7ffff0ed2fe0)
    at net/ipv6/ip6_output.c:929
#2  0x00007ffff35d1707 in ip6_dst_lookup_flow (sk=0x7ffff30b5c50, fl6=0x7ffff0ed2fe0, final_dst=0x0) at net/ipv6/ip6_output.c:1024
#3  0x00007ffff36199f7 in rawv6_sendmsg (sk=0x7ffff30b5c50, msg=0x7ffff0ed3320, len=32) at net/ipv6/raw.c:872
#4  0x00007ffff3526d21 in inet_sendmsg (sock=0x7ffff30b5810, msg=0x7ffff0ed3320, size=32) at net/ipv4/af_inet.c:737
#5  0x00007ffff338dc8a in sock_sendmsg_nosec (msg=0x7ffff0ed3320, sock=0x7ffff30b5810) at net/socket.c:610
#6  sock_sendmsg (sock=0x7ffff30b5810, msg=0x7ffff0ed3320) at net/socket.c:620


which (*dst)->error of ip6_route_output gives -22 (EINVAL).

I will be back once I got any findings.

-- Hajime

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-11 13:22       ` Hajime Tazaki
@ 2015-10-11 13:31         ` David Ahern
  2015-10-11 14:24           ` Hajime Tazaki
  0 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2015-10-11 13:31 UTC (permalink / raw)
  To: Hajime Tazaki; +Cc: steffen.klassert, netdev

[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]

On 10/11/15 7:22 AM, Hajime Tazaki wrote:
>
> At Fri, 9 Oct 2015 11:27:36 -0600,
> David Ahern wrote:
>>
>> [1  <text/plain; windows-1252 (7bit)>]
>> On 10/9/15 1:17 AM, Steffen Klassert wrote:
>>>>> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
>>>>> index 30caa289c5db..5cedfda4b241 100644
>>>>> --- a/net/ipv6/xfrm6_policy.c
>>>>> +++ b/net/ipv6/xfrm6_policy.c
>>>>> @@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif,
>>>>>
>>>>>    	memset(&fl6, 0, sizeof(fl6));
>>>>>    	fl6.flowi6_oif = oif;
>>>>> +	fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
>>>>>    	memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr));
>>>>>    	if (saddr)
>>>>>    		memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr));
>>>>
>>>> I found that this fix is still not sufficient with the mip6
>>>> (Mobile IPv6) use case.
>>>
>>> It does not even fix the vti case. The behaviour of the vti devices is
>>> the same, with and without the patch.
>>>
>>
>> The attached patch applied to Linus' tree works for me. Currently the
>> above change is not in his tree, so I added it to this patch. Once you
>> confirm that it works for you I'll create the delta-patch for net and
>> send out.
>
> I gave it a try but without any luck unfortunately.
> I may need to look carefully what mip6 does here.
>
> the code path where I'm looking at for MH packet (raw socket
> with IPPROTO_MH) is:
>
> #0  ip6_route_output (net=0x7ffff3a40a40 <rumpns_init_net>, sk=0x7ffff30b5c50, fl6=0x7ffff0ed2fe0) at net/ipv6/route.c:1195
> #1  0x00007ffff35d155f in ip6_dst_lookup_tail (net=0x7ffff3a40a40 <rumpns_init_net>, sk=0x7ffff30b5c50, dst=0x7ffff0ed2f18, fl6=0x7ffff0ed2fe0)
>      at net/ipv6/ip6_output.c:929
> #2  0x00007ffff35d1707 in ip6_dst_lookup_flow (sk=0x7ffff30b5c50, fl6=0x7ffff0ed2fe0, final_dst=0x0) at net/ipv6/ip6_output.c:1024
> #3  0x00007ffff36199f7 in rawv6_sendmsg (sk=0x7ffff30b5c50, msg=0x7ffff0ed3320, len=32) at net/ipv6/raw.c:872
> #4  0x00007ffff3526d21 in inet_sendmsg (sock=0x7ffff30b5810, msg=0x7ffff0ed3320, size=32) at net/ipv4/af_inet.c:737
> #5  0x00007ffff338dc8a in sock_sendmsg_nosec (msg=0x7ffff0ed3320, sock=0x7ffff30b5810) at net/socket.c:610
> #6  sock_sendmsg (sock=0x7ffff30b5810, msg=0x7ffff0ed3320) at net/socket.c:620
>
>
> which (*dst)->error of ip6_route_output gives -22 (EINVAL).
>
> I will be back once I got any findings.

How does a xfrm change affect this code path?

Can you apply this patch, and then run:

perf record -e fib6:* -a -g
perf script

Thanks,
David



[-- Attachment #2: 0001-net-IPv6-fib-and-route-tracepoints.patch --]
[-- Type: text/plain, Size: 10292 bytes --]

>From 5da83796c110d5f2c995b22b38be8e5268a7f573 Mon Sep 17 00:00:00 2001
From: David Ahern <dsa@cumulusnetworks.com>
Date: Thu, 1 Oct 2015 14:52:58 -0700
Subject: [PATCH net-next] net: IPv6 fib and route tracepoints

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/trace/events/fib6.h | 237 ++++++++++++++++++++++++++++++++++++++++++++
 net/core/net-traces.c       |   7 ++
 net/ipv6/route.c            |  24 ++++-
 3 files changed, 267 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/events/fib6.h

diff --git a/include/trace/events/fib6.h b/include/trace/events/fib6.h
new file mode 100644
index 000000000000..9f34da0d59a3
--- /dev/null
+++ b/include/trace/events/fib6.h
@@ -0,0 +1,237 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fib6
+
+#if !defined(_TRACE_FIB6_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FIB6_H
+
+#include <linux/in6.h>
+#include <net/flow.h>
+#include <net/ip6_fib.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(fib6_table_lookup,
+
+	TP_PROTO(const struct net *net, const struct rt6_info *rt,
+		 u32 tb_id, const struct flowi6 *flp),
+
+	TP_ARGS(net, rt, tb_id, flp),
+
+	TP_STRUCT__entry(
+		__field(	u32,	tb_id		)
+
+		__field(	int,	oif		)
+		__field(	int,	iif		)
+		__field(	__u8,	tos		)
+		__field(	__u8,	scope		)
+		__field(	__u8,	flags		)
+		__array(	__u8,	src,	16	)
+		__array(	__u8,	dst,	16	)
+
+		__dynamic_array(	char,	name,	IFNAMSIZ )
+		__array(		__u8,	gw,	16	 )
+	),
+
+	TP_fast_assign(
+		struct in6_addr *in6;
+
+		__entry->tb_id = tb_id;
+		__entry->oif = flp->flowi6_oif;
+		__entry->iif = flp->flowi6_iif;
+		__entry->tos = flp->flowi6_tos;
+		__entry->scope = flp->flowi6_scope;
+		__entry->flags = flp->flowi6_flags;
+
+		in6 = (struct in6_addr *)__entry->src;
+		*in6 = flp->saddr;
+
+		in6 = (struct in6_addr *)__entry->dst;
+		*in6 = flp->daddr;
+
+		if (rt->rt6i_idev) {
+			__assign_str(name, rt->rt6i_idev->dev->name);
+		} else {
+			__assign_str(name, "");
+		}
+		if (rt == net->ipv6.ip6_null_entry) {
+			struct in6_addr in6_zero = {};
+
+			in6 = (struct in6_addr *)__entry->gw;
+			*in6 = in6_zero;
+
+		} else if (rt) {
+			in6 = (struct in6_addr *)__entry->gw;
+			*in6 = rt->rt6i_gateway;
+		}
+	),
+
+	TP_printk("table %3u oif %d iif %d src %pI6c dst %pI6c tos %d scope %d flags %x ==> dev %s gw %pI6c",
+		  __entry->tb_id, __entry->oif, __entry->iif,
+		  __entry->src, __entry->dst, __entry->tos, __entry->scope,
+		  __entry->flags, __get_str(name), __entry->gw)
+);
+
+TRACE_EVENT(ip6_route_del,
+
+	TP_PROTO(const struct fib6_config *cfg),
+
+	TP_ARGS(cfg),
+
+	TP_STRUCT__entry(
+		__field(	u32,	tb_id		)
+		__field(	int,	iif		)
+		__field(	int,	src_len		)
+		__field(	int,	dst_len		)
+		__array(	__u8,	src,	 16	)
+		__array(	__u8,	dst,	 16	)
+		__array(	__u8,	prefsrc, 16	)
+		__array(	__u8,	gw,	 16	)
+	),
+
+	TP_fast_assign(
+		struct in6_addr *in6;
+
+		__entry->tb_id = cfg->fc_table;
+		__entry->iif = cfg->fc_ifindex;
+
+		in6 = (struct in6_addr *)__entry->src;
+		*in6 = cfg->fc_src;
+		__entry->src_len = cfg->fc_src_len;
+
+		in6 = (struct in6_addr *)__entry->dst;
+		*in6 = cfg->fc_dst;
+		__entry->dst_len = cfg->fc_dst_len;
+
+		in6 = (struct in6_addr *)__entry->prefsrc;
+		*in6 = cfg->fc_prefsrc;
+
+		in6 = (struct in6_addr *)__entry->gw;
+		*in6 = cfg->fc_gateway;
+	),
+
+	TP_printk("table %u ifindex %d src %pI6c/%d dst %pI6c/%d prefsrc %pI6c gw %pI6c",
+		  __entry->tb_id, __entry->iif, __entry->src, __entry->src_len,
+		  __entry->dst, __entry->dst_len, __entry->prefsrc, __entry->gw)
+);
+
+TRACE_EVENT(ipv6_alloc_dst,
+
+	TP_PROTO(const struct rt6_info *rt),
+
+	TP_ARGS(rt),
+
+	TP_STRUCT__entry(
+		__field(	const void *,	dst		)
+		__field(	void *,	input		)
+		__field(	void *,	output		)
+		__field(	__u32,	flags		)
+		__field(	__u32,	tb_id		)
+		__dynamic_array(	char,	name,	IFNAMSIZ )
+		__array(	__u8,	gw,	16	)
+		__array(	__u8,	addr,	16	)
+		__field(	__u32,	plen		)
+	),
+
+	TP_fast_assign(
+		struct in6_addr *in6;
+
+		__entry->dst = &rt->dst;
+		__entry->input = rt->dst.input;
+		__entry->output = rt->dst.output;
+		__entry->flags = rt->rt6i_flags;
+		if (rt->rt6i_table) {
+			__entry->tb_id = rt->rt6i_table->tb6_id;
+		} else {
+			__entry->tb_id = 0;
+		}
+		in6 = (struct in6_addr *)__entry->gw;
+		*in6 = rt->rt6i_gateway;
+
+		in6 = (struct in6_addr *)__entry->addr;
+		*in6 = rt->rt6i_dst.addr;
+		__entry->plen = rt->rt6i_dst.plen;
+
+		if (rt->rt6i_idev) {
+			__assign_str(name, rt->rt6i_idev->dev->name);
+		} else {
+			__assign_str(name, "<null>");
+		}
+	),
+
+	TP_printk("table %3u flags %x dev %s gw %pI6c addr %pI6c/%d dst %p input %p output %p",
+		  __entry->tb_id, __entry->flags, __get_str(name),
+		  __entry->gw, __entry->addr, __entry->plen,
+		  __entry->dst, __entry->input, __entry->output)
+);
+
+TRACE_EVENT(ipv6_alloc_pcpu_dst,
+
+	TP_PROTO(const struct rt6_info *rt, const struct rt6_info *rt_orig),
+
+	TP_ARGS(rt, rt_orig),
+
+	TP_STRUCT__entry(
+		__field(	const void *,	dst_orig	)
+		__field(	const void *,	dst		)
+	),
+
+	TP_fast_assign(
+		__entry->dst_orig = &rt_orig->dst;
+		__entry->dst = &rt->dst;
+	),
+
+	TP_printk("dst %p dst orig %p",
+		  __entry->dst, __entry->dst_orig)
+);
+
+TRACE_EVENT(fib6_ifdown,
+
+	TP_PROTO(const struct net_device *dev, const struct rt6_info *rt),
+
+	TP_ARGS(dev, rt),
+
+	TP_STRUCT__entry(
+		__field(	const void *,	dst		)
+		__field(	__u32,	flags		)
+		__field(	__u32,	tb_id		)
+		__dynamic_array(	char,	dev,	IFNAMSIZ )
+		__dynamic_array(	char,	rt_dev,	IFNAMSIZ )
+		__array(	__u8,	gw,	16	)
+		__array(	__u8,	addr,	16	)
+		__field(	__u32,	plen		)
+	),
+
+	TP_fast_assign(
+		struct in6_addr *in6;
+
+		__entry->dst = &rt->dst;
+		__entry->flags = rt->rt6i_flags;
+		if (rt->rt6i_table) {
+			__entry->tb_id = rt->rt6i_table->tb6_id;
+		} else {
+			__entry->tb_id = 0;
+		}
+		in6 = (struct in6_addr *)__entry->gw;
+		*in6 = rt->rt6i_gateway;
+
+		in6 = (struct in6_addr *)__entry->addr;
+		*in6 = rt->rt6i_dst.addr;
+		__entry->plen = rt->rt6i_dst.plen;
+
+		__assign_str(dev, dev->name);
+		if (rt->rt6i_idev) {
+			__assign_str(rt_dev, rt->rt6i_idev->dev->name);
+		} else {
+			__assign_str(rt_dev, "<null>");
+		}
+	),
+
+	TP_printk("dev %s table %3u flags %x rt_dev %s gw %pI6c addr %pI6c/%d dst %p",
+		  __get_str(dev), __entry->tb_id, __entry->flags, __get_str(rt_dev),
+		  __entry->gw, __entry->addr, __entry->plen, __entry->dst)
+
+);
+
+#endif /* _TRACE_FIB6_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index adef015b2f41..a4ee389237e5 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -32,6 +32,13 @@
 #include <trace/events/sock.h>
 #include <trace/events/udp.h>
 #include <trace/events/fib.h>
+#if IS_ENABLED(CONFIG_IPV6)
+#include <trace/events/fib6.h>
+EXPORT_TRACEPOINT_SYMBOL_GPL(fib6_table_lookup);
+EXPORT_TRACEPOINT_SYMBOL_GPL(ipv6_alloc_pcpu_dst);
+EXPORT_TRACEPOINT_SYMBOL_GPL(ipv6_alloc_dst);
+EXPORT_TRACEPOINT_SYMBOL_GPL(ip6_route_del);
+#endif
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 63446bae7f5f..781651a2cf01 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -62,6 +62,7 @@
 #include <net/lwtunnel.h>
 #include <net/ip_tunnels.h>
 #include <net/l3mdev.h>
+#include <trace/events/fib6.h>
 
 #include <asm/uaccess.h>
 
@@ -857,6 +858,9 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
 	}
 	dst_use(&rt->dst, jiffies);
 	read_unlock_bh(&table->tb6_lock);
+
+	trace_fib6_table_lookup(net, rt, table->tb6_id, fl6);
+
 	return rt;
 
 }
@@ -958,6 +962,7 @@ static struct rt6_info *ip6_rt_cache_alloc(struct rt6_info *ort,
 #endif
 	}
 
+	trace_ipv6_alloc_dst(rt);
 	return rt;
 }
 
@@ -973,6 +978,8 @@ static struct rt6_info *ip6_rt_pcpu_alloc(struct rt6_info *rt)
 	ip6_rt_copy_init(pcpu_rt, rt);
 	pcpu_rt->rt6i_protocol = rt->rt6i_protocol;
 	pcpu_rt->rt6i_flags |= RTF_PCPU;
+
+	trace_ipv6_alloc_pcpu_dst(pcpu_rt, rt);
 	return pcpu_rt;
 }
 
@@ -1070,6 +1077,8 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 		read_unlock_bh(&table->tb6_lock);
 
 		rt6_dst_from_metrics_check(rt);
+
+		trace_fib6_table_lookup(net, rt, table->tb6_id, fl6);
 		return rt;
 	} else if (unlikely((fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH) &&
 			    !(rt->rt6i_flags & RTF_GATEWAY))) {
@@ -1093,6 +1102,8 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 			uncached_rt = net->ipv6.ip6_null_entry;
 
 		dst_hold(&uncached_rt->dst);
+
+		trace_fib6_table_lookup(net, uncached_rt, table->tb6_id, fl6);
 		return uncached_rt;
 
 	} else {
@@ -1117,6 +1128,7 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 			dst_release(&rt->dst);
 		}
 
+		trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6);
 		return pcpu_rt;
 
 	}
@@ -1460,6 +1472,7 @@ static struct rt6_info *__ip6_route_redirect(struct net *net,
 
 	read_unlock_bh(&table->tb6_lock);
 
+	trace_fib6_table_lookup(net, rt, table->tb6_id, fl6);
 	return rt;
 };
 
@@ -1600,6 +1613,8 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
 	rt->rt6i_idev     = idev;
 	dst_metric_set(&rt->dst, RTAX_HOPLIMIT, 0);
 
+	trace_ipv6_alloc_dst(rt);
+
 	spin_lock_bh(&icmp6_dst_lock);
 	rt->dst.next = icmp6_dst_gc_list;
 	icmp6_dst_gc_list = &rt->dst;
@@ -1967,6 +1982,8 @@ int ip6_route_info_create(struct fib6_config *cfg, struct rt6_info **rt_ret)
 
 	cfg->fc_nlinfo.nl_net = dev_net(dev);
 
+	trace_ipv6_alloc_dst(rt);
+
 	*rt_ret = rt;
 
 	return 0;
@@ -2046,6 +2063,8 @@ static int ip6_route_del(struct fib6_config *cfg)
 	struct rt6_info *rt;
 	int err = -ESRCH;
 
+	trace_ip6_route_del(cfg);
+
 	table = fib6_get_table(cfg->fc_nlinfo.nl_net, cfg->fc_table);
 	if (!table)
 		return err;
@@ -2510,6 +2529,7 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
 
 	atomic_set(&rt->dst.__refcnt, 1);
 
+	trace_ipv6_alloc_dst(rt);
 	return rt;
 }
 
@@ -2595,8 +2615,10 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
 	const struct net_device *dev = adn->dev;
 
 	if ((rt->dst.dev == dev || !dev) &&
-	    rt != adn->net->ipv6.ip6_null_entry)
+	    rt != adn->net->ipv6.ip6_null_entry) {
+		trace_fib6_ifdown(dev, rt);
 		return -1;
+	}
 
 	return 0;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-11 13:31         ` David Ahern
@ 2015-10-11 14:24           ` Hajime Tazaki
  2015-10-11 18:01             ` David Ahern
  0 siblings, 1 reply; 17+ messages in thread
From: Hajime Tazaki @ 2015-10-11 14:24 UTC (permalink / raw)
  To: dsa; +Cc: steffen.klassert, netdev


At Sun, 11 Oct 2015 07:31:32 -0600,
David Ahern wrote:

> >> The attached patch applied to Linus' tree works for me. Currently the
> >> above change is not in his tree, so I added it to this patch. Once you
> >> confirm that it works for you I'll create the delta-patch for net and
> >> send out.
> >
> > I gave it a try but without any luck unfortunately.
> > I may need to look carefully what mip6 does here.
> >
> > the code path where I'm looking at for MH packet (raw socket
> > with IPPROTO_MH) is:
> >
> > #0  ip6_route_output (net=0x7ffff3a40a40 <rumpns_init_net>, sk=0x7ffff30b5c50, fl6=0x7ffff0ed2fe0) at net/ipv6/route.c:1195
> > #1  0x00007ffff35d155f in ip6_dst_lookup_tail (net=0x7ffff3a40a40 <rumpns_init_net>, sk=0x7ffff30b5c50, dst=0x7ffff0ed2f18, fl6=0x7ffff0ed2fe0)
> >      at net/ipv6/ip6_output.c:929
> > #2  0x00007ffff35d1707 in ip6_dst_lookup_flow (sk=0x7ffff30b5c50, fl6=0x7ffff0ed2fe0, final_dst=0x0) at net/ipv6/ip6_output.c:1024
> > #3  0x00007ffff36199f7 in rawv6_sendmsg (sk=0x7ffff30b5c50, msg=0x7ffff0ed3320, len=32) at net/ipv6/raw.c:872
> > #4  0x00007ffff3526d21 in inet_sendmsg (sock=0x7ffff30b5810, msg=0x7ffff0ed3320, size=32) at net/ipv4/af_inet.c:737
> > #5  0x00007ffff338dc8a in sock_sendmsg_nosec (msg=0x7ffff0ed3320, sock=0x7ffff30b5810) at net/socket.c:610
> > #6  sock_sendmsg (sock=0x7ffff30b5810, msg=0x7ffff0ed3320) at net/socket.c:620
> >
> >
> > which (*dst)->error of ip6_route_output gives -22 (EINVAL).
> >
> > I will be back once I got any findings.
> 
> How does a xfrm change affect this code path?

I've faced this issue since the following patch was applied.

commit 741a11d9e4103a8e1c590ef1280143fe654e4e33
Author: David Ahern <dsa@cumulusnetworks.com>
Date:   Mon Sep 28 10:12:13 2015 -0700

    net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set

I still couldn't spot which part (other than my posted call
graph) is broken and am not sure whether the xfrm change
affects or not (which I need to check the mip6 code again).

> Can you apply this patch, and then run:
> 
> perf record -e fib6:* -a -g
> perf script

I'm using libos environment right now, so the perf trace
can't be used as it is.

I'll keep post here.

-- Hajime

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-11 14:24           ` Hajime Tazaki
@ 2015-10-11 18:01             ` David Ahern
  2015-10-20 12:31               ` Hajime Tazaki
  0 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2015-10-11 18:01 UTC (permalink / raw)
  To: Hajime Tazaki; +Cc: steffen.klassert, netdev

On 10/11/15 8:24 AM, Hajime Tazaki wrote:
>
> I've faced this issue since the following patch was applied.
>
> commit 741a11d9e4103a8e1c590ef1280143fe654e4e33
> Author: David Ahern <dsa@cumulusnetworks.com>
> Date:   Mon Sep 28 10:12:13 2015 -0700
>
>      net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set
>
> I still couldn't spot which part (other than my posted call
> graph) is broken and am not sure whether the xfrm change
> affects or not (which I need to check the mip6 code again).

Ok, this is a separate problem from what Steffen is hitting.

>
>> Can you apply this patch, and then run:
>>
>> perf record -e fib6:* -a -g
>> perf script
>
> I'm using libos environment right now, so the perf trace
> can't be used as it is.

ok.

Some path in raw6_sendmsg is setting fl6.flowi6_oif. Can you instrument it?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-09 17:27     ` David Ahern
  2015-10-11 13:22       ` Hajime Tazaki
@ 2015-10-12 18:49       ` David Ahern
  2015-10-13 14:34         ` Steffen Klassert
  2015-10-19  8:01         ` Steffen Klassert
  1 sibling, 2 replies; 17+ messages in thread
From: David Ahern @ 2015-10-12 18:49 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Hajime Tazaki, netdev

On 10/9/15 11:27 AM, David Ahern wrote:
> On 10/9/15 1:17 AM, Steffen Klassert wrote:
>>>> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
>>>> index 30caa289c5db..5cedfda4b241 100644
>>>> --- a/net/ipv6/xfrm6_policy.c
>>>> +++ b/net/ipv6/xfrm6_policy.c
>>>> @@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct
>>>> net *net, int tos, int oif,
>>>>
>>>>       memset(&fl6, 0, sizeof(fl6));
>>>>       fl6.flowi6_oif = oif;
>>>> +    fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
>>>>       memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr));
>>>>       if (saddr)
>>>>           memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr));
>>>
>>> I found that this fix is still not sufficient with the mip6
>>> (Mobile IPv6) use case.
>>
>> It does not even fix the vti case. The behaviour of the vti devices is
>> the same, with and without the patch.
>>
>
> The attached patch applied to Linus' tree works for me. Currently the
> above change is not in his tree, so I added it to this patch. Once you
> confirm that it works for you I'll create the delta-patch for net and
> send out.

Steffen: Have you had a chance to try the patch? Does it solve the vti6 
problem for you?

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-12 18:49       ` David Ahern
@ 2015-10-13 14:34         ` Steffen Klassert
  2015-10-19  8:01         ` Steffen Klassert
  1 sibling, 0 replies; 17+ messages in thread
From: Steffen Klassert @ 2015-10-13 14:34 UTC (permalink / raw)
  To: David Ahern; +Cc: Hajime Tazaki, netdev

Hi David.

On Mon, Oct 12, 2015 at 12:49:29PM -0600, David Ahern wrote:
> On 10/9/15 11:27 AM, David Ahern wrote:
> >
> >The attached patch applied to Linus' tree works for me. Currently the
> >above change is not in his tree, so I added it to this patch. Once you
> >confirm that it works for you I'll create the delta-patch for net and
> >send out.
> 
> Steffen: Have you had a chance to try the patch? Does it solve the
> vti6 problem for you?
> 

I'm not at office this week, so not able to do a test
before monday. I let you know as soon as I tried the
patch.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-12 18:49       ` David Ahern
  2015-10-13 14:34         ` Steffen Klassert
@ 2015-10-19  8:01         ` Steffen Klassert
  1 sibling, 0 replies; 17+ messages in thread
From: Steffen Klassert @ 2015-10-19  8:01 UTC (permalink / raw)
  To: David Ahern; +Cc: Hajime Tazaki, netdev

On Mon, Oct 12, 2015 at 12:49:29PM -0600, David Ahern wrote:
> On 10/9/15 11:27 AM, David Ahern wrote:
> >On 10/9/15 1:17 AM, Steffen Klassert wrote:
> >>>>diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> >>>>index 30caa289c5db..5cedfda4b241 100644
> >>>>--- a/net/ipv6/xfrm6_policy.c
> >>>>+++ b/net/ipv6/xfrm6_policy.c
> >>>>@@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct
> >>>>net *net, int tos, int oif,
> >>>>
> >>>>      memset(&fl6, 0, sizeof(fl6));
> >>>>      fl6.flowi6_oif = oif;
> >>>>+    fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
> >>>>      memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr));
> >>>>      if (saddr)
> >>>>          memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr));
> >>>
> >>>I found that this fix is still not sufficient with the mip6
> >>>(Mobile IPv6) use case.
> >>
> >>It does not even fix the vti case. The behaviour of the vti devices is
> >>the same, with and without the patch.
> >>
> >
> >The attached patch applied to Linus' tree works for me. Currently the
> >above change is not in his tree, so I added it to this patch. Once you
> >confirm that it works for you I'll create the delta-patch for net and
> >send out.
> 
> Steffen: Have you had a chance to try the patch? Does it solve the
> vti6 problem for you?

The delta of your patch and the current net tree fixes the
vti6 problems.

Current net-next works without any changes.

Thanks David!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-11 18:01             ` David Ahern
@ 2015-10-20 12:31               ` Hajime Tazaki
  2015-10-20 17:48                 ` David Ahern
  0 siblings, 1 reply; 17+ messages in thread
From: Hajime Tazaki @ 2015-10-20 12:31 UTC (permalink / raw)
  To: dsa; +Cc: steffen.klassert, netdev


Hello David,

sorry for the delay.

At Sun, 11 Oct 2015 12:01:30 -0600,
David Ahern wrote:
> 
> On 10/11/15 8:24 AM, Hajime Tazaki wrote:
> >
> > I've faced this issue since the following patch was applied.
> >
> > commit 741a11d9e4103a8e1c590ef1280143fe654e4e33
> > Author: David Ahern <dsa@cumulusnetworks.com>
> > Date:   Mon Sep 28 10:12:13 2015 -0700
> >
> >      net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set
> >
> > I still couldn't spot which part (other than my posted call
> > graph) is broken and am not sure whether the xfrm change
> > affects or not (which I need to check the mip6 code again).
> 
> Ok, this is a separate problem from what Steffen is hitting.

agree.

> >
> >> Can you apply this patch, and then run:
> >>
> >> perf record -e fib6:* -a -g
> >> perf script
> >
> > I'm using libos environment right now, so the perf trace
> > can't be used as it is.
> 
> ok.
> 
> Some path in raw6_sendmsg is setting fl6.flowi6_oif. Can you instrument it?

yes, this sendmsg uses non-zero flowi6_oif.


the conditions are

- sendmsg () with INET6/RAW socket (with IPPROTO_MH)
- ip6_pktinfo.ipi6_addr (fl6.saddr) and ipi6_oif
  (fl6.flowi6_oif) are non-NULL.
=> ipi6_addr (fl6.saddr) is not the IP address of oif, but
    another interfaces (home address of mip6)
- fib6_lookup() (in ip6_pol_route()) gives ip6_null_entry

if RT6_LOOKUP_F_IFACE isn't set at ip6_route_output, it
can look for a proper dst_entry of default route
if not, it gives EINVAL.

I'm sure that this is not the right fix for this issue, but
the following patch solves my situation.

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index df24cff4a0cb..02e86989b3cb 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1079,6 +1079,12 @@ redo_rt6_select:
                fn = fib6_backtrack(fn, &fl6->saddr);
                if (fn) 
                        goto redo_rt6_select;
+               else if (strict & RT6_LOOKUP_F_IFACE) {
+                       /* also consider non-interface route */
+                       strict &= ~RT6_LOOKUP_F_IFACE;
+                       fn = saved_fn;
+                       goto redo_rt6_select;
+               }
                else if (strict & RT6_LOOKUP_F_REACHABLE) {
                        /* also consider unreachable route */
                        strict &= ~RT6_LOOKUP_F_REACHABLE;

I'm trying to create a minimum reproducible code and spot
the issue but not get it yet. let me know if you find any
good idea.

-- Hajime

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-20 12:31               ` Hajime Tazaki
@ 2015-10-20 17:48                 ` David Ahern
  2015-10-21  2:38                   ` Hajime Tazaki
  0 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2015-10-20 17:48 UTC (permalink / raw)
  To: Hajime Tazaki; +Cc: steffen.klassert, netdev

[-- Attachment #1: Type: text/plain, Size: 503 bytes --]

On 10/20/15 6:31 AM, Hajime Tazaki wrote:
> yes, this sendmsg uses non-zero flowi6_oif.
>
>
> the conditions are
>
> - sendmsg () with INET6/RAW socket (with IPPROTO_MH)
> - ip6_pktinfo.ipi6_addr (fl6.saddr) and ipi6_oif
>    (fl6.flowi6_oif) are non-NULL.
> => ipi6_addr (fl6.saddr) is not the IP address of oif, but
>      another interfaces (home address of mip6)

interesting. so forcing a send out of interface X but using the source 
address of interface Y.

Does the attached patch work for you?

[-- Attachment #2: ipv6-oif-with-saddr.patch --]
[-- Type: text/plain, Size: 848 bytes --]

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d0619632723a..2701cb3d88e9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1171,6 +1171,7 @@ struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk,
 {
 	struct dst_entry *dst;
 	int flags = 0;
+	bool any_src;
 
 	dst = l3mdev_rt6_dst_by_oif(net, fl6);
 	if (dst)
@@ -1178,11 +1179,12 @@ struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk,
 
 	fl6->flowi6_iif = LOOPBACK_IFINDEX;
 
+	any_src = ipv6_addr_any(&fl6->saddr);
 	if ((sk && sk->sk_bound_dev_if) || rt6_need_strict(&fl6->daddr) ||
-	    fl6->flowi6_oif)
+	    (fl6->flowi6_oif && any_src))
 		flags |= RT6_LOOKUP_F_IFACE;
 
-	if (!ipv6_addr_any(&fl6->saddr))
+	if (!any_src)
 		flags |= RT6_LOOKUP_F_HAS_SADDR;
 	else if (sk)
 		flags |= rt6_srcprefs2flags(inet6_sk(sk)->srcprefs);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
  2015-10-20 17:48                 ` David Ahern
@ 2015-10-21  2:38                   ` Hajime Tazaki
  0 siblings, 0 replies; 17+ messages in thread
From: Hajime Tazaki @ 2015-10-21  2:38 UTC (permalink / raw)
  To: dsa; +Cc: steffen.klassert, netdev


At Tue, 20 Oct 2015 11:48:48 -0600,
David Ahern wrote:

> > the conditions are
> >
> > - sendmsg () with INET6/RAW socket (with IPPROTO_MH)
> > - ip6_pktinfo.ipi6_addr (fl6.saddr) and ipi6_oif
> >    (fl6.flowi6_oif) are non-NULL.
> > => ipi6_addr (fl6.saddr) is not the IP address of oif, but
> >      another interfaces (home address of mip6)
> 
> interesting. so forcing a send out of interface X but using the source 
> address of interface Y.
> 
> Does the attached patch work for you?

this patch works fine for the above case.
other tests of mine are also fine.

many thanks !

-- Hajime

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-10-21  2:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-05 14:32 [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6 David Ahern
2015-10-07 11:25 ` David Miller
2015-10-07 14:12   ` David Ahern
2015-10-09  6:54 ` Hajime Tazaki
2015-10-09  7:17   ` Steffen Klassert
2015-10-09 15:53     ` David Ahern
2015-10-09 17:27     ` David Ahern
2015-10-11 13:22       ` Hajime Tazaki
2015-10-11 13:31         ` David Ahern
2015-10-11 14:24           ` Hajime Tazaki
2015-10-11 18:01             ` David Ahern
2015-10-20 12:31               ` Hajime Tazaki
2015-10-20 17:48                 ` David Ahern
2015-10-21  2:38                   ` Hajime Tazaki
2015-10-12 18:49       ` David Ahern
2015-10-13 14:34         ` Steffen Klassert
2015-10-19  8:01         ` Steffen Klassert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).